[PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs

Mark Rutland mark.rutland at arm.com
Mon Nov 23 05:49:12 PST 2015


On Mon, Nov 23, 2015 at 12:15:15PM +0000, Catalin Marinas wrote:
> On Fri, Nov 20, 2015 at 07:52:44PM +0000, Mark Rutland wrote:
> > On Thu, Nov 19, 2015 at 11:31:34AM +0000, Mark Rutland wrote:
> > > I think that if we need to do something more drastic to account for the
> > > other issues above (e.g. by ensuring that we can never allocate
> > > conflicting TLB entries in the first place), and that said strategy
> > > would also fix this problem, that would be preferable, given that we're
> > > going to have to do that eventually anyway.
> > 
> > Having looked into this further, we also have the same issue with the
> > kasan init code.
> 
> I don't think the kasan_init() problem is that bad. We are preserving
> the same size mappings (PAGE_SIZE) and just changing the physical
> address they point at without a break-before-make (just a TTBR1 switch).

Per the ARM ARM, "CONSTRAINED UNPREDICTABLE behaviors due to caching of
control or data values", the result of a translation could be "an
amalgamation" of the values. I believe that we have to read
"amalgamation" as "arbitrary function of" here.

I don't think that we're safe because we only changed the output
addresses of entries.

> I don't know how clear the ARM ARM is around this but at least so far we
> haven't hit any problems.

I assume you're talking generally here, rather than specifically about
kasan. I agree that we haven't spotted any issues so far.

Given that kasan itself is new and requires a relatively new compiler,
it may not yet have been tested on a platform where it would fail on.

Jeremy, for reference, have you tried kasan on m400? Or DEBUG_RODATA?

> The problem with the contiguous bit is that we switch from e.g. a 4KB
> mapping to a 64KB one and it's very likely that we would get a TLB
> conflict.
> 
> With CONFIG_DEBUG_RODATA, we go from bigger block to a smaller one, so
> less chance of a TLB conflict but still present. I need to read the ARM
> ARM some more in this area (and maybe ask for clarification).

We should certainly try to get clarification here.

> > I believe that the issue is restricted to one-off init code, as I don't
> > think that we do anything at runtime which would be problematic. If
> > anyone knows of a counter-example, please let me know!
> > 
> > Given that, we can restrict the problem to an early UP environment, and
> > it won't matter if therre's some large(ish) fixed cost associated with
> > updating the kernel page tables. I think that we can avoid the issue
> > entirely by modifying a copy of the kernel page tables, which we can
> > later install via some idmap code (going via a reserved table to clear
> > the TLBs).
> > 
> > I'm working on patches to implement the above, which I'll try to get
> > somewhere with next week.
> 
> That's a complete fix indeed but it would require some more testing and
> I don't think it's feasible for 4.4-rc. In the meantime, I propose that
> we revert the contiguous PTE patches and push them again once we fix the
> TLB conflict problems.

I agree that this would be too late for v4.4-rc*.

In the meantime, I guess that reverting the patches is the best thing to
do given we're already at rc2.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list