[PATCH] efi: arm-stub: Correct FDT and initrd allocation rules for arm64
Ard Biesheuvel
ard.biesheuvel at linaro.org
Thu Feb 9 10:26:14 PST 2017
On 9 February 2017 at 18:18, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 9 February 2017 at 18:01, Jeffrey Hugo <jhugo at codeaurora.org> wrote:
>> On 2/9/2017 10:45 AM, Ard Biesheuvel wrote:
>>>
>>> On 9 February 2017 at 17:41, Jeffrey Hugo <jhugo at codeaurora.org> wrote:
>>>>
>>>> On 2/9/2017 10:16 AM, Ard Biesheuvel wrote:
>>>>>
>>>>>
>>>>> On 9 February 2017 at 17:06, Jeffrey Hugo <jhugo at codeaurora.org> wrote:
>>>>>>
>>>>>>
>>>>>> On 2/9/2017 3:16 AM, Ard Biesheuvel wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On arm64, we have made some changes over the past year to the way the
>>>>>>> kernel itself is allocated and to how it deals with the initrd and
>>>>>>> FDT.
>>>>>>> This patch brings the allocation logic in the EFI stub in line with
>>>>>>> that,
>>>>>>> which is necessary because the introduction of KASLR has created the
>>>>>>> possibility for the initrd to be allocated in a place where the kernel
>>>>>>> may not be able to map it. (This is currently a theoretical scenario,
>>>>>>> since it only affects systems where the size of RAM exceeds the size
>>>>>>> of
>>>>>>> the linear mapping.)
>>>>>>>
>>>>>>> So adhere to the arm64 boot protocol, and make sure that the initrd is
>>>>>>> fully inside a 1GB aligned 32 GB window that covers the kernel as
>>>>>>> well.
>>>>>>>
>>>>>>> The FDT may be anywhere in memory on arm64 now that we map it via the
>>>>>>> fixmap, so we can lift the address restriction there completely.
>>>>>>>
>>>>>>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>>>>>>> ---
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'll give this a test on our platform that was running into the current
>>>>>> limitation - probably this weekend.
>>>>>>
>>>>>> I reviewed the code and its ok, but I do have one question. Do we need
>>>>>> to
>>>>>> handle the case where initrd ends up below the kernel?
>>>>>>
>>>>>> Lets assume KALSR puts the kernel somewhere up high in DDR, after the
>>>>>> 32GB
>>>>>> mark in DDR. Now lets assume the unlikely scenario that the initrd
>>>>>> won't
>>>>>> fit anywhere after 32GB, but will fit before 32GB. Per my
>>>>>> understanding
>>>>>> of
>>>>>> efi_high_alloc, it will put the initrd before the 32GB mark, which will
>>>>>> be
>>>>>> outside of the window where the kernel is.
>>>>>>
>>>>>
>>>>> The 32 GB does not have to be 32 GB aligned, only 1 GB aligned. So as
>>>>> long as the follow expression holds, we should be fine
>>>>>
>>>>>
>>>>> align(max(kernel_end, initrd_end), 1g) - align_down (min
>>>>> (kernel_start, initrd_start), 1g) <= 32g
>>>>
>>>>
>>>>
>>>> Yes, and I argue there is a possibility (we'll call it extremely remote)
>>>> where that may not hold. My question is, do we care about that
>>>> possibility,
>>>> and if so, do we do anything about it?
>>>>
>>>
>>> We allocate top down, so we start at align_down(base_of_image, 1g) +
>>> 32g, and go down until we hit a region that first our initrd. We will
>>> disregard the region that the kernel occupies, but below that, we will
>>> just proceed until we find a slot. This effectively means we have a 63
>>> GB window, with the kernel in the middle, where we can load the initrd
>>> and adhere to the boot protocol. I don't see how we could end up in
>>> the situation where we load the kernel somewhere, and both the 31 GB
>>> before *and* after are completely occupied.
>>>
>>
>> No we don't. We do not allocate top down. Please look at efi_high_alloc.
>>
>> Efi_high_alloc iterates though the memory map, low to high.
BTW the memory map isn't necessarily sorted per the UEFI spec, so it
iterates in what is essentially random order, not low to high.
>> It looks to see
>> if a slot can hold the allocation, and the slot does not exceed the
>> specified max. If so, efi_high_alloc retains a reference to the slot. Then
>> efi_high_alloc continues iterating though the map, until the end.
>> efi_high_alloc only stores a reference to the most recently valid slot,
>> which would be the highest slot in the map.
>>
>
> It is documented as
>
> /*
> * Allocate at the highest possible address that is not above 'max'.
> */
>
> and what you describe is pretty much that, no?
>
>> My system can have 256GB (or more) of RAM. It is possible, however remote,
>> that the initrd and kernel can be more than 64GB away from each other.
>>
>> Lets assume KASLR puts the kernel at 250GB. Lets assume, for whatever
>> reason, we can't fit the initrd above 150GB (there was just enough room to
>> jam kernel there somwhow, but firmware is consuming the rest, maybe it put
>> rootfs there via NFIT).
>
> So before even booting the kernel, you already have 100 GB of memory
> occupied? As I replied before, you are correct that in this case, you
> will not be able to put the initrd within 32 GB of the kernel. But do
> note that this 32 GB figure is derived from the linear region size of
> a 16k pages kernel with 2 levels of translation, which is a niche
> configuration by itself. On a system that has 256 GB of RAM, it is
> highly unlikely that you will be using a kernel that can only map 32
> GB of it.
>
> The reason for choosing the 32 GB figure is that it relieves the boot
> loader from having to go and figure out what kind of kernel is going
> to be executed. Page size can be read from the Image header but the VA
> size cannot. So 32 GB was a reasonable number imo.
>
>> efi_high_alloc will put the initrd at some point
>> just below 150GB, because it iterates low to high,
>
> No, because everything above that is occupied. If efi_high_alloc()
> does not do what it says on the tin, we should fix that.
>
>> and 150GB will be below
>> the max of 250GB where the kernel is. This will result in the initrd and
>> kernel being ~100GB away in this example, which violates the requirements
>> stated in Booting.txt
>>
>> I see the situation is possible, but I admit it is remote. If you want to
>> ignore it, fine. I would be happy with that so long as the assumption is
>> documented so that if it is ever somehow violated in the real world, we know
>> what broke.
>>
More information about the linux-arm-kernel
mailing list