makedumpfile: question about memory hole
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Tue May 14 21:00:41 EDT 2013
(2013/05/14 10:55), Atsushi Kumagai wrote:
> Hello HATAYAMA-san,
>
> Sorry for the delayed response, again...
>
> On Fri, 29 Mar 2013 17:13:11 +0900
> Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp> wrote:
>
>> Hello HATAYAMA-san,
>>
>> Sorry for the delayed response.
>>
>> On Tue, 19 Mar 2013 17:47:45 +0900 (JST)
>> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
>>
>>>>>>> What I don't understand well is that the part here:
>>>>>>>
>>>>>>> pfn_start = paddr_to_pfn(phys_start);
>>>>>>> pfn_end = paddr_to_pfn(phys_end);
>>>>>>>
>>>>>>> if (!is_in_segs(pfn_to_paddr(pfn_start)))
>>>>>>> pfn_start++;
>>>>>>>
>>>>>>> phys_start and pfn_to_paddr(pfn_start) should belong to the same page
>>>>>>> frame, so I suspect the pfn_start should be included in vmcore.
>>>>>>>
>>>>>>> Looking into kexec-tool side, I don't see additional modification made
>>>>>>> to phys_start after it's parsed from /proc/iomem or counterpart on EFI
>>>>>>> interface. Is there any assumption about memory holes behind kernel?
>>>>>>
>>>>>> Here is a PT_LOAD segment of ia64 machine which I actually use:
>>>>>>
>>>>>> Type Offset VirtAddr PhysAddr
>>>>>> FileSiz MemSiz Flags Align
>>>>>> [...]
>>>>>> LOAD 0x000000015fd0b490 0xe0000040ffda5000 0x00000040ffda5000
>>>>>> 0x000000000005a000 0x000000000005a000 RWE 0
>>>>>>
>>>>>> In this case, pfn_to_paddr(pfn_start) is aligned to 0x40ffda4000
>>>>>> because the page size is 16KiB, and this address is out of PT_LOAD
>>>>>> segment.
>>>>>>
>>>>>> phys_start
>>>>>> = 0x40ffda5000
>>>>>> |------------- PT_LOAD ----------------
>>>>>> ----+----------+----------+----------+--------
>>>>>> | pfn:N | pfn:N+1 | pfn:N+2 | ...
>>>>>> ----+----------+----------+----------+--------
>>>>>> |
>>>>>> pfn_to_paddr(pfn:N)
>>>>>> = 0x40ffda4000
>>>>>>
>>>>>> The statement you said is for care the case that phys_start isn't aligned
>>>>>> with the page size.
>>>>>>
>>>>>> BTW, I'll add a comment to explain this intention into here.
>>>>>
>>>>> Thanks for the pictorial explanation. It's easy to understand.
>>>>>
>>>>> Still I think pfn:N should be included in vmcore. The current
>>>>> implementation drops [0x40ffda5000, 0x40ffda8000] that is contained in
>>>>> the PT_LOAD. Or, the range must be hole or other kinds of unnecessary
>>>>> memory from some kernel-side assumption?
>>>>
>>>> Oh, I understand your question correctly now.
>>>>
>>>> When Ohmichi-san wrote this code, he thought the page which include
>>>> memory hole isn't be used. This came from the fact that the basic
>>>> unit of memory management is *page*, but there is no detailed
>>>> investigation.
>>>
>>> You mean on at least IA64 case such parts are always holes?
>>
>> I showed the IA64 case just to say that the statement can be executed
>> actually and it's meaningful code, and this is from my misunderstanding
>> of your question.
>> Whether such parts are holes or not is another matter, and I haven't
>> enough information to decide it now.
>>
>>>>
>>>> So, if there is any case where pfn:N is actually used, this statement
>>>> should be removed. Maybe, does this question come from an idea of such
>>>> cases ?
>>>
>>> I'm wondering if such case can actually happens.
>>
>> I checked a memory map on another IA64 machine and found the regions
>> that not be aligned by page-size:
>>
>> # cat /proc/iomem | grep System
>> ...
>> 4040000000-40fea09fff : System RAM
>> 40fea0a000-40fef5ffff : System RAM // include"pfn:N" 40fea0a000-
>> 40fef60000-40fef63fff : System RAM
>>
>> According to this, it seems that such regions can be exist normally
>> at least on IA64. So, what we should investigate is how does kernel
>> manage such regions (e.g. [0x40fea0a000, 0x40fea0c000]).
>> And this is the "kernel-side assumption" you said first, right ?
>
> First, the memory map(iomem_resource) is made from EFI memory map
> with efi_initialize_iomem_resources(), then no rounding occurs.
> And EFI page size is 4KB(EFI_PAGE_SHIFT == 12), so it is natural
> that some regions aren't aligned by linux kernel page size.
>
> Anyway, I found the case that "pfn:N" mentioned in previous mail was
> actually used on the IA64 machine.
>
>>>>>> |------------- PT_LOAD ----------------
>>>>>> ----+----------+----------+----------+--------
>>>>>> | pfn:N | pfn:N+1 | pfn:N+2 | ...
>>>>>> ----+----------+----------+----------+--------
>
> Here is the machine's /proc/iomem and dmesg:
>
> # cat /proc/iomem | grep System
> ...
> 4040000000-40fea09fff : System RAM
> 40fea0a000-40fef5ffff : System RAM // start address corresponds to "pfn:N"
> 40fef60000-40fef63fff : System RAM
>
> # dmesg
> ...
> rsvd_region[0]: [0xe000000001000000, 0xe0000000010000a8)
> rsvd_region[1]: [0xe000000004000000, 0xe000000004e94e68)
> rsvd_region[2]: [0xe0000040fea0a010, 0xe0000040fea0a060) // stored in "pfn:N"
> rsvd_region[3]: [0xe0000040fea0dfd8, 0xe0000040fea0e010)
> rsvd_region[4]: [0xe0000040fea10000, 0xe0000040fef5fc79)
> rsvd_region[5]: [0xe0000040fefd0010, 0xe0000040fefd0790)
> rsvd_region[6]: [0xffffffffffffffff, 0xffffffffffffffff)
>
> // these are virtual addresses, __pa(0xe0000040fea0a010) = 0x40fea0a010
>
> According to reserve_memory(), rsvd_region[2] is used to save
> ia64_boot_param->command_line. This means that "pfn:N" can
> include valid dates, we shouldn't remove it as holes.
>
> Thank you for pointing out this issue, I'll fix it.
Thanks for your investigation. I'm now very clear to what's happening there.
--
Thanks.
HATAYAMA, Daisuke
More information about the kexec
mailing list