[PATCH 0/2] arm64: kexec_file_load vs memory reservations
james.morse at arm.com
Wed Jun 2 09:58:10 PDT 2021
On 02/06/2021 16:59, Marc Zyngier wrote:
> On Wed, 02 Jun 2021 15:22:00 +0100,
> James Morse <james.morse at arm.com> wrote:
>> On 29/04/2021 14:35, Marc Zyngier wrote:
>>> It recently became apparent that using kexec with kexec_file_load() on
>>> arm64 is pretty similar to playing Russian roulette.
>>> Depending on the amount of memory, the HW supported and the firmware
>>> interface used, your secondary kernel may overwrite critical memory
>>> regions without which the secondary kernel cannot boot (the GICv3 LPI
>>> tables being a prime example of such reserved regions).
>>> It turns out that there is at least two ways for reserved memory
>>> regions to be described to kexec: /proc/iomem for the userspace
>>> implementation, and memblock.reserved for kexec_file.
>> One is spilled into the other by request_standard_resources()...
>>> And of course,
>>> our LPI tables are only reserved using the resource tree, leading to
>>> the aforementioned stamping.
>> Presumably well after efi_init() has run...
> Yup, much later. And we can keep on reserving memory as long as we
> boot new CPUs. Having it as a one-off sync doesn't really help here.
It might need doing for all possible CPUs up-front... otherwise someone loads a kexec
kernel and correctly picks a safe location ... then a CPU comes online and reserves a hole
in the middle of it: kexec isn't using the selected location until you reboot().
(memory hotplug has some 'fun' in this area, which can only be fixed by using memblock,
which ought to know about removable memory ranges ... but doesn't)
There does need to be a point where the list of reserved memory stops changing.
>>> Similar things could happen with ACPI tables as well.
>> efi_init() calls reserve_regions(), which has:
>> | /* keep ACPI reclaim memory intact for kexec etc. */
>> | if (md->type == EFI_ACPI_RECLAIM_MEMORY)
>> | memblock_reserve(paddr, size);
>> This is also what stops mm from allocating them, as
>> memblock-reserved gets copied into the PG_Reserved flag by
>> free_low_memory_core_early()'s calls to reserve_bootmem_region().
>> Is your machines firmware putting them in a region with a different type?
> Good question. Moritz (cc'd) saw the tables being overwritten on his
> system (which I don't have access to), so I guess this is not entirely
> clear cut how this happens.
If we have systems that store the tables in 'conventional memory' we have bigger problems!
> My SQ box reports the ACPI region as "ACPI Reclaim", so I guess it
> works as expected here.
>> (The UEFI spec has something to say: see 2.3.6 "AArch64 Platforms":
>> | ACPI Tables loaded at boot time can be contained in memory of type EfiACPIReclaimMemory
>> | (recommended) or EfiACPIMemoryNVS
>> NVS would fail the is_usable_memory() check earlier, so gets treated
>> as nomap)
> Note that I've since changed tactics and proposed that we fully rely
> on the resource tree instead.
Yup - I came back here to work out why you gave up on memblock:reserving the reserved
More information about the kexec