[PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs

Will Deacon will.deacon at arm.com
Mon Nov 23 07:41:22 PST 2015


On Mon, Nov 23, 2015 at 08:48:40AM -0600, Jeremy Linton wrote:
> On 11/23/2015 07:49 AM, Mark Rutland wrote:
> >Jeremy, for reference, have you tried kasan on m400? Or DEBUG_RODATA?
> No, because the machine has a list of issues that keep it from booting a
> mainline kernel in a functional state. Once those are cleared up I will
> revisit this patch.
> The goal was to create a conceptually safe fix for the problem, which isn't
> all this hypothetical stuff being discussed, but the fact that the TLBs are
> not being flushed properly (with or without the CONT bit stuff) resulting in
> a tlb conflict fault long after this code path has finished executing.

I appreciate that you just want to get something working, and we've
established that a TLBIALL probably does solve the problem, however there's
more to it than that.

If we start adding TLBI/DSB/ISB/NOP/cache flush/read from config space
type operations in arbitrary places, we run a very real risk of masking
future bugs. Then, when the original problem that prompted the temporary
bodge is fixed properly, we uncover a whole raft of problems that we didn't
even realise were there. Consequently, we get stuck with the bodge forever
and, crucially, we lose the ability to reason about the state of the CPU
under Linux. That reasoning is incredibly useful when developing new
architectural features, debugging, profiling or assessing whether or
not Linux is susceptible to hardware errata and is precisely why the
"hypothetical stuff" matters.

For these reasons, we have no viable option other than reverting the
offending series until the underlying problem is fixed properly. With
any luck, that's the 4.5 timeframe so we really only lose a single
release (providing that you have time to rebase and repost).

Sorry about this; I hope it doesn't dissuade you from reporting bugs in
the future.


More information about the linux-arm-kernel mailing list