[RFC] arm64: extra entries in /proc/iomem for kexec

Thu Apr 26 07:26:29 PDT 2018

Hi Akashi,

On 26/04/18 08:40, AKASHI Takahiro wrote:
> On Wed, Apr 25, 2018 at 02:22:07PM +0100, James Morse wrote:
>> On 25/04/18 10:20, AKASHI Takahiro wrote:
>>> On Tue, Apr 24, 2018 at 05:08:57PM +0100, James Morse wrote:
>>>> If we squash the memblock_reserved() stuff down so it appears as a top level
>>>> 'reserved' region too, I don't think we do.
>>>
>>> If I correctly understand, you're talking about my format (E).
>>> As I said, it will fix the issue without modifying user-space, but
>>>
>>> || This does not only look quite noisy but also ignores the fact that
>>> || reserved regions are part of System RAM (or memblock.memory).
>>
>> I agree its noisy, there are significantly more 'reserved' areas, but these are
>> all either nomap or memblock_reserved().
>>
>> Why does it matter if a reserved-region is nomap or memblock_reserved()? Any new
>> kernel will learn the difference from the EFI memory map and make its own decisions.
> 
> Yeah, kernel can do (though kernel won't look though system resources list
> for this purpose anyway), what about kexec-like user applications?
> It may want to seek /proc/iomem to identify all the *usable* memory on
> the system, that is "System RAM", but doesn't care whether some range is
> reserved or not (for some reason) yet does care !NOMAP.

Do you have an example application?
This would have to be a program digging in /dev/mem where it wants to touch
memory the kernel has reserved, but doesn't want to receive a signal if it
touches memory that's nomap. This doesn't seem a likely use-case.

We could change the names for the memblock_reserved()/nomap entries, but as
kexec-tools spots 'reserved' and almost does the right thing, I kept it as it is.

>>>> This prevents the efi-memory-map
>>>> being overwritten on kernels since kexec was merged.
>>>>
>>>> Its horribly fiddly to do this. The kernel code/data are special reserved
>>>> regions that we already describe as a subset of system-ram, even though they are
>>>> both also fragments of a bigger memblock_reserved() block.
>>>
>>> Actually, we don't have to avoid kernel code/data regions as copying
>>> loaded data to the final destinations will be done at the very end of kexec.
>>
>> For kexec yes, but that is the existing format of the file, which we shouldn't
>> change, otherwise we break something else.
> 
> One trivial downside in this approach is that a secondary kernel will be
> loaded at an address different from the one of current kernel.
> While it is sane, it looks a bit odd that, every time kexec'ed, a new> kernel (code/data) is located back and forth :)

Yes, but all versions of the kernel that support kexec will be quite happy with
this. The memory below the kernel could be re-used since KASLR support was
merged before kexec.

I was more worried that the extra fragmentation would cause kexec-tools to stop
searching early, as it seems to have a #defined'd limit of how much of that file
it will parse. But, this would be an existing bug, because there could be many
nomap regions up-front before any large-enough chunk of system-ram appears.

Thanks,

James