arm64/efi handling of persistent memory

Ard Biesheuvel ard.biesheuvel at linaro.org
Fri Dec 18 05:07:10 PST 2015


On 18 December 2015 at 12:45, Mark Rutland <mark.rutland at arm.com> wrote:
> On Fri, Dec 18, 2015 at 11:06:51AM +0000, Leif Lindholm wrote:
>> On Fri, Dec 18, 2015 at 01:33:25AM +0000, Elliott, Robert (Persistent Memory) wrote:
>> > Similar to the questions about the arm64 efi boot stub
>> > handing persistent memory, some of the arm64 kernel code
>> > looks fishy.
>> >
>> > In arch/arm64/kernel/efi.c:
>> >
>> > static int __init is_normal_ram(efi_memory_desc_t *md)
>> > {
>> >         if (md->attribute & EFI_MEMORY_WB)
>> >                 return 1;
>> >         return 0;
>> > }
>> >
>> > static __init int is_reserve_region(efi_memory_desc_t *md)
>> > {
>> >         switch (md->type) {
>> >         case EFI_LOADER_CODE:
>> >         case EFI_LOADER_DATA:
>> >         case EFI_BOOT_SERVICES_CODE:
>> >         case EFI_BOOT_SERVICES_DATA:
>> >         case EFI_CONVENTIONAL_MEMORY:
>> >         case EFI_PERSISTENT_MEMORY:
>> >                 return 0;
>> >         default:
>> >                 break;
>> >         }
>> >         return is_normal_ram(md);
>> > }
>> >
>> > static __init void reserve_regions(void)
>> > {
>> > ...
>> >                 if (is_normal_ram(md))
>> >                         early_init_dt_add_memory_arch(paddr, size);
>> >
>> >                 if (is_reserve_region(md)) {
>> >                         memblock_reserve(paddr, size);
>> > ...
>> >
>> > static bool __init efi_virtmap_init(void)
>> > {
>> > ...
>> >                 if (!is_normal_ram(md))
>> >                         prot = __pgprot(PROT_DEVICE_nGnRE);
>> >                 else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
>> >                          !PAGE_ALIGNED(md->phys_addr))
>> >                         prot = PAGE_KERNEL_EXEC;
>> >                 else
>> >                         prot = PAGE_KERNEL;
>> >
>> > Concerns include:
>> >
>> > 1. is_normal_ram() will see the WB bit and return 1 regardless
>> > of the type.  That seems similar to the arm EFI boot stub issue.
>> > The three callers are shown above.
>>
>> So, first and third cases look OK to me, but the bit where we add
>> things to memblock just for being WB is bogus.
>
> We should be able to do this much more simply by only ever adding what
> we know is safe. We can "reserve" portions by nomapping them. e.g.
>
> bool can_add_to_memblock(efi_memory_desc_t *md)
> {
>         if (!(md->attribute & EFI_MEMORY_WB))
>                 return false;
>
>         switch (md->type) {
>         case EFI_LOADER_CODE:
>         case EFI_LOADER_DATA:
>         case EFI_BOOT_SERVICES_CODE:
>         case EFI_BOOT_SERVICES_DATA:
>         case EFI_CONVENTIONAL_MEMORY:
>                 return true;
>         }
>
>         return false;
> }
>
> for_each_efi_memory_desc(&memmap, md) {
>         if (can_add_to_memblock(md))
>                 early_init_dt_add_memory_arch(...);
>         else
>                 memblock_mark_nomap(...);
> }
>
> Though I suspect Ard might know some reason that won't work.
>

Before we start hacking away at this at the arm64/EFI level, do we
have any documentation and/or consensus regarding how persistent
memory should be treated in the first place? Should it be covered by
memblock? Should it be covered by the linear mapping? Should it be
memblock_reserve()'d?

>> > 2. is_reserve_region() treating EFI_PERSISTENT_MEMORY the same
>> > as EFI_CONVENTIONAL_MEMORY looks wrong.
>>
>> Yeah... That one was introduced by
>> ad5fb870c486 ("e820, efi: add ACPI 6.0 persistent memory types")
>> without any ACKs from ARM people :/
>>
>> While it probably wouldn't wreck your system, it is unlikely to do
>> what you'd want.
>
> It might corrupt that data you wanted to persist, so it's definitely a
> bug. Severity is arguable.
>
>> > 3. We're contemplating working around the grub problem by
>> > reporting EFI_RESERVED_MEMORY plus the NV attribute rather
>> > than EFI_PERSISTENT_MEMORY.
>
> I'm missing something. Robert, do you mean GRUB has a similar bug?
>
>> That sounds a bit ... nuclear.
>> Would you then be expecting to retreive information about the NV
>> device out of hw description, or via PCI, rather than the UEFI memory
>> map?
>
> That's going to cause much more pain. We should fix this code.
>
>> > If this is done, then is_reserve_region() will fall through
>> > to is_normal_ram(), which will see the WB bit and return 1.
>> > That seems backwards... but seems correct for persistent
>> > memory, reporting it as a reserved region.  That might avoid the
>> > the EFI_PERSISTENT_MEMORY handling problem (if the preceding
>> > call to is_normal_ram() didn't already cause problems).
>>
>> So ... the code is convoluted and could probably do with a
>> refresh. But is_normal_ram() returning 1 means is_reserve_region()
>> will return 1, meaning we end up reserving it in memblock and not
>> allocating in it.
>>
>> However, this is for is_reserve_region() - we will still have added it
>> to memblock with early_init_dt_add_memory_arch(), which may have
>> unwanted side effects.
>
> See above for a suggestion on that front.
>
>> I thought Ard had some patches in flight to address this, but they
>> don't appear to be in yet.
>
> The nomap stuff is in the arm64 git repo, in aarch64/efi [1] and
> for-next/core [2]. That alone doesn't seem to address this, though.
>
> Thanks,
> Mark.
>
> [1] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=aarch64/efi
> [2] https://git.kernel.org/cgit/linux/kernel/git/arm64/linux.git/log/?h=for-next/core



More information about the linux-arm-kernel mailing list