[PATCH makedumpfile] Fix incorrect PFN exclusion when LOAD segments overlap
HAGIO KAZUHITO(萩尾 一仁)
k-hagio-ab at nec.com
Tue Oct 8 18:47:40 PDT 2024
Hi Ming,
thank you for the detailed description.
On 2024/10/08 17:43, Ming Wang wrote:
>> On 2024/08/14 11:43, Ming Wang wrote:
>>> When iterating through LOAD segments to exclude pages in
>>> `exclude_nodata_pages`, simply using `phys_end` to track allocated
>>> PFNs can lead to the erroneous exclusion of valid pages from a
>>> subsequent LOAD segment if its `phys_start` overlaps with the
>>> previous segment's address range.
>>>
>>> This patch addresses the issue by checking for such overlaps with the
>>> next LOAD segment's physical address range. If an overlap is
>>> detected, we continue including PFNs from the current segment until
>>> reaching the end of the overlapping region.
>>>
>>> This fix ensures that all valid pages within overlapping LOAD segments
>>> are correctly included during PFN exclusion.
>>
>> I have a few questions:
>>
>> - I heard from Masa that the overlap is caused by a bug, isn't it fixed
>> on the kernel side? or it will be fixed but is this patch for existing
>> kernels that have the bug?
> The overlap you mentioned is not caused by a kernel bug and is not expected to be fixed in the kernel.
I see.
>>
>> - exclude_nodata_pages() is for dumpfiles created by "makedumpfile -E",
>> which uses p_filesz in program headers to record existing pages after
>> exclusion. The function excludes pages only between
>> phys_start+file_size and phys_end, so if p_filesz == p_memsz, it will
>> not exclude any pages.
> My system's page size is 16K, Consider the following LOAD segment information
>
> phys_start phys_end virt_start virt_end
> LOAD[ 0] 200000 2186200 9000000000200000 9000000002186200
> LOAD[ 1] 200000 ee00000 9000000000200000 900000000ee00000
>
> I added some print statements within the exclude_nodata_pages function:
>
> while (pfn < pfn_end) {
> ERRMSG("\nfunc:%s: pfn: %#llx phys_start: %#llx phys_end: %#llx file_size: %lu pfn_end:%#llx\n"
> , __func__, pfn, phys_start, phys_end, (unsigned long)file_size, pfn_end);
> clear_bit_on_2nd_bitmap(pfn, cycle);
> ++pfn;
> }
>
> The following output was observed:
>
> func:exclude_nodata_pages: pfn: 0x861 phys_start: 0x200000 phys_end: 0x2186200 file_size: 33055232 pfn_end:0x862
>
> After this print statement, clear_bit_on_2nd_bitmap is called, effectively clearing the bitmap bit for pfn: 0x861.
> However, pfn: 0x861 is a valid PFN that happens to store the kernel's init_uts_ns. This leads to an error when the
> crash tool attempts to detect the kernel version:
>
> read_diskdump: PAGE_EXCLUDED: paddr/pfn: 2185d98/861
> crash: page excluded: kernel virtual address: 9000000002185d98 type: "init_uts_ns"
So the calculated pfn:0x861 is wrong?
> LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000
> 0x0000000002550400 0x0000000002550400 RWE 0x0
This is a different example, it's expected that p_memsz equals to p_filesz.
But, the sizes are not aligned to the page size 16k, and
pfn = paddr_to_pfn(phys_start + file_size);
it looks like this rounds down the last valid page..? so
- pfn = paddr_to_pfn(phys_start + file_size);
+ pfn = paddr_to_pfn(roundup(phys_start + file_size, PAGESIZE());
pfn_end = paddr_to_pfn(roundup(phys_end, PAGESIZE()));
How about this instead of your patch?
Thanks,
Kazu
>> I would like to check how they are on your machine, so could I have a
>> "readelf -l /proc/vmcore" output?
> The output of the command readelf -l /proc/vmcore executed on my system is provided below.
>
> It is important to note that this output corresponds to a different kernel version compared to the one used in the preceding example:
>
> [root at localhost ~]# readelf -l /proc/vmcore
>
> Elf file type is CORE (Core file)
> Entry point 0x0
> There are 8 program headers, starting at offset 64
>
> Program Headers:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> NOTE 0x0000000000004000 0x0000000000000000 0x0000000000000000
> 0x0000000000004a54 0x0000000000004a54 0x4
> LOAD 0x000000000000c000 0x9000000000200000 0x0000000000200000
> 0x0000000002550400 0x0000000002550400 RWE 0x0
> LOAD 0x0000000002560000 0x9000000000200000 0x0000000000200000
> 0x000000000ec00000 0x000000000ec00000 RWE 0x0
> LOAD 0x0000000011160000 0x9000000090400000 0x0000000090400000
> 0x0000000011200000 0x0000000011200000 RWE 0x0
> LOAD 0x0000000022360000 0x90000000f8e00000 0x00000000f8e00000
> 0x00000000000c0000 0x00000000000c0000 RWE 0x0
> LOAD 0x0000000022420000 0x90000000f8ed0000 0x00000000f8ed0000
> 0x0000000004c70000 0x0000000004c70000 RWE 0x0
> LOAD 0x0000000027090000 0x90000000fe100000 0x00000000fe100000
> 0x0000000f81f00000 0x0000000f81f00000 RWE 0x0
> LOAD 0x0000000fa8f90000 0x9000100080000000 0x0000100080000000
> 0x0000001000000000 0x0000001000000000 RWE 0x0
>>
>> - Do you mean that the bug always puts the two LOAD segments that have
>> the same phys_start in a row?
> Yes, as demonstrated in the example above.
>
> Thanks,
> Ming
More information about the kexec
mailing list