arm64 crashkernel fails to boot on acpi-only machines due to ACPI regions being no longer mapped as NOMAP

Ard Biesheuvel ard.biesheuvel at linaro.org
Mon Dec 4 06:02:31 PST 2017


On 26 November 2017 at 08:29, Bhupesh SHARMA <bhupesh.linux at gmail.com> wrote:
> Hi Akashi,
>
> On Thu, Nov 16, 2017 at 12:30 PM, AKASHI Takahiro
> <takahiro.akashi at linaro.org> wrote:
>> Bhupesh,
>>
>> On Wed, Nov 15, 2017 at 04:28:55PM +0530, Bhupesh Sharma wrote:
>>>
>>   (snip)
>>
>>> # dmesg | grep -B 2 -i "ACPI reclaim"
>>> [    0.000000] efi:   0x000039670000-0x0000396bffff [Runtime Code |RUN|  |
>>> |  |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x0000396c0000-0x00003970ffff [Boot Code |   |  |  |
>>> |  |  |  |   |WB|WT|WC|UC]
>>> [    0.000000] efi:   0x000039710000-0x00003975ffff [ACPI Reclaim Memory|
>>> |  |  |  |  |  |  |   |WB|WT|WC|UC]
>>>
>>> 2. Now, I am not sure which kernel layer does the following changes (I am
>>> still trying to dig it out more), but I see that the 'Boot Code' and ACPI
>>> DSDT table regions are somehow merged into one memblock_region and appear as
>>> range '396c0000-3975ffff' in the '/proc/iomem' interface:
>>>
>>> # cat /proc/iomem | grep -A 2 -B 2 39
>>> 00000000-3961ffff : System RAM
>>>   00080000-00b6ffff : Kernel code
>>>   00cb0000-0167ffff : Kernel data
>>>   0e800000-2e7fffff : Crash kernel
>>> 39620000-396bffff : reserved
>>> 396c0000-3975ffff : System RAM
>>> 39760000-3976ffff : reserved
>>> 39770000-397affff : reserved
>>> 397b0000-3989ffff : reserved
>>> 398a0000-398bffff : reserved
>>> 398c0000-39d3ffff : reserved
>>> 39d40000-3ed2ffff : System RAM
>>>
>>   (snip)
>>>
>>> So, I am looking at what could be causing the 'Boot Code' and 'ACPI DSDT
>>> table' ranges to be merged into a single region at
>>> '0x0000396c0000-0x00003970ffff' which cannot be marked as RESERVED using
>>> 'memblock_is_reserved'.
>>
>> Simple:) The short answer is that memblock_add() does.
>>
>> The long answer:
>> First, please note that memblock maintains two type of regions list,
>> "memory" and "reserved".
>>
>> efi_init()
>>     reserve_regions()
>>         early_init_dt_add_memory_arch()
>>             memblock_add()
>>                 memblock_add_range(memblock.memory)
>>
>> The memory regions described in efi.memmap are added to "memory" list
>> with all the neighboring regions being merged into ones,
>> in this case, "Runtime Code", "Boot Code", "ACPI Reclaim Memory" and others.
>>
>> The secret here is that "Runtime Code" is also marked with "NOMAP" flag in
>> reserve_regions(), which creates an isolated region since it now has
>> a different attribute.
>> Consequently only "Boot Code" and "ACPI Reclaim Memory" are
>> unified.
>>
>> Look at request_standard_resources(). It handles only "memory" list,
>> and doesn't care about whether any arbitrary part of memory is in
>> "reserved" list or not.
>
> Thanks for the pointers. Now I did some experiments and traversed the
> whole memblock path and I see
> how these two regions get merged into a single region which is later
> on recognized by
> 'request_standard_resources()' as a System RAM region rather than a
> RESERVED region.
>
> I recently reproduced this on a APM mustang with latest kernel as well
> when acpi is used to boot the machine, which makes me believe that
> this is a generic issue for arm64 machines with the 4.14 kernel and if
> they use acpi=force as the boot method.
>
> I am not sure, if a fix/or hack would be suitable for all underlying
> arm64 machines, but I am trying one on the arm64 machines I have to
> see if it fixes the issue.
>
> @Ard:
>
> Hi Ard,
>
> I think to create and test a clean solution for all arm64 boards it
> will take some time, in the meantime should we consider reverting the
> commit [1] to make sure that acpi enabled arm64 machines can boot with
> 4.14?
>
> Please let me know your opinion.
>
> [1] f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a (efi/arm: Don't mark
> ACPI reclaim memory as MEMBLOCK_NOMAP)
>

I don't think that is really going to help tbh.

ACPI reclaim regions are not the only regions that are
memblock_reserve()d and need to be reserved by the incoming kernel as
well. So as far as I can tell, this is a symptom of an underlying
issue that we will need to solve, and reverting the code that exposed
it will not make the bug go away.

-- 
Ard.



More information about the linux-arm-kernel mailing list