[RFC] arm64: extra entries in /proc/iomem for kexec

James Morse james.morse at arm.com
Tue Apr 24 09:08:57 PDT 2018


Hi Akashi,

On 16/04/18 11:08, AKASHI Takahiro wrote:
> On Thu, Apr 12, 2018 at 05:01:52PM +0100, James Morse wrote:
>> On 05/04/18 03:42, AKASHI Takahiro wrote:
>>> On Mon, Apr 02, 2018 at 10:53:32AM +0900, AKASHI Takahiro wrote:
>>>> Basically, changes that I made on /proc/iomem in my new format D were:
>>>> 1. to move NOMAP region entries, formerly named "reserved" and now named
>>>>    "reserved (no map)", under System RAM
>>>> 2. to add new entries for firmware-reserved regions as "reserved" also
>>>>    under System RAM
>>>>
>>>> On the other hand, current kexec-tools, in particular kexec command,
>>>> only scan top-level "System RAM" entries as well as "reserved" entries.
>>
>> as well as?
> 
> I had few words here.
> The current kexec-tools assumes that "reserved" entries appear only
> at the top level. So,
> 
>> Does this mean kexec will pick up the reserved region if its written as:
>> | 00001000-0009d7ff : System RAM
>> |    00001000-00001fff  : reserved
> 
> if this is the case, the range "0x1000-0x1fff" is added to an internal
> list of memory ranges 

I found this in get_memory_ranges_iomem_cb()...


> but will later be *ignored* by locate_hole() function
> due to its memory type.

Ugh. Great.


> That is, the range can potentially be overwritten by loaded kernel/initrd.

So two kernel bugs, one user-space bug, all conspiring.


>>>> either because
>>>> a. new kernel (or initrd/dtb) may have been allocated on a NOMAP region
>>>>    which are not suitable for usable memory, or
>>>> b. new kernel (or initrd/dtb) may have been allocated on a reserved region
>>>>    whose contents can be overwritten.
>>>>
>>>> While we see (b) even today, (a) is a backward compatibility issue.
>>
>> (a) doesn't happen because request_standard_resources() checks
>> memblock_is_nomap(), and reports those regions as 'reserved'.
> 
> I might have confused you. The assumption here was that we adopt format (D),
> where all NOMAP regions are sub nodes of "System RAM", but still use
> the current kexec-tools.
> As I said above, this will end up an un-expected behavior.

I'd like to fix this without having to fix user-space at the same time. It looks
like no-one else has second level reserved regions, so we can't blame
kexec-tools for looking straight at them, then ignoring them.


>> We can't expect user-space to upgrade to fix this issue.
> 
> I'm not sure what you mean here; we can't fix the issue anyway
> without changing user-space/kexec-tools as kexec_load system call totally
> relies on parameters passed by kexec-tools.
> (The only difference is whether we need additional kernel changes or not.)

It looks like this was always broken because the efi memory map isn't listed as
'reserved' in /proc/iomem. The fallout for the new stuff is secondary.


>>> # I don't know yet whether people are happy with this fix, and also have
>>>   kernel patches for my other approaches. They are neither not much
>>>   complicated.
>>
>> I don't think we should fix this in userspace, exporting all the
>> memblock_reserved() regions as 'reserved' in /proc/iomem looks like the right
>> thing to do.
> 
> Again, if you modify /proc/iomem, you have to update kexec-tools, too.

If we squash the memblock_reserved() stuff down so it appears as a top level
'reserved' region too, I don't think we do. This prevents the efi-memory-map
being overwritten on kernels since kexec was merged.

Its horribly fiddly to do this. The kernel code/data are special reserved
regions that we already describe as a subset of system-ram, even though they are
both also fragments of a bigger memblock_reserved() block.

While we can walk memblock for regions that aren't reserved, allocating memory
in the loop changes what is reserved. That one O(N) walk ends up being four...

I'm almost done tearing my hair out, I should have a working patch soon...


>> wasn't there going to be another version, with the core EFI
>> stuff split out?
> 
> ? I don't remember well ...

https://lkml.org/lkml/2018/2/1/496


Thanks,

James




More information about the linux-arm-kernel mailing list