[PATCH] [PATCH] arm64: Boot failure on m400 with new cont PTEs

Jeremy Linton jeremy.linton at arm.com
Wed Nov 18 08:08:58 PST 2015


On 11/18/2015 09:20 AM, Mark Rutland wrote:
> Hi Jeremy,
>
> On Wed, Nov 18, 2015 at 09:03:19AM -0600, Jeremy Linton wrote:
>> The HP m400 fails to boot the linux 4.4rc1 kernel.
>
> Are you using defconfig? If not, can you share your config?
	No, its not defconfig, its roughly the RHELSA config tossed into a 
mainline 4.4 tree and all the default options selected. AFAIK RHELSA is 
still limited access.

>
>> It usually hangs or sometimes takes an unhanded exception around the
>> DMA zone messages. This was bisected to the new CONT PTE changes.
>
> Do you have any examples of the unhandled exception cases? Are they a
> mixed bag, or a consistent exception class?

I'm guessing about 90% of the time its a dead hang, the remaining are 
the faults of which there is one that happens more frequently than the 
others. Here is one i found in my notes..

[    0.000000] On node 0 totalpages: 1048512
[    0.000000]   DMA zone: 64 pages used for memmap
[    0.000000]   DMA zone: 0 pages reserved
[    0.000000]   DMA zone: 65472 pages, LIFO batch:1
[    0.000000] Unhandled fault: unknown 48 (0x96000070) at 
0xfffffe0000d60588

>> Adding an extra flush_tlb_all() in the code path which is
>> changing the kernel permissions allows the machine to boot
>> consistently.
>
> As you mention changing permissions, I take it you're using
> CONFIG_DEBUG_RODATA?

The failing configuration doesn't have DEBUG_RODATA set, I might have 
been pretty loose with my terminology.

Frankly, I wondered originally how config RODATA was working reliably 
because the flushes were only around the directories getting split, 
fixup_init() (and basically anything calling create_mapping_late()) 
looked like there were paths that could avoid flushing. When I added the 
CONT changes I didn't add flushes to paths that didn't previously have 
them (except in the split cont range case, which matched the spit p[mu]d 
case). I made the mistake of assuming someone knew about some edge case 
that avoided the need for the flush.

Once I find/fix the console issue on that machine with 4.4rc1 (there are 
a small handful of issues that keep mainline from working on it, 
including the sata patch that was posted, and rejected), I will focus on 
hoisting the tlb flush into create_mapping_late() and removing the 
splattering of flushes in those code paths. That is unless there is a 
reason to be preforming them as soon as the directories are split.







More information about the linux-arm-kernel mailing list