[PATCH makedumpfile] Fix incorrect PFN exclusion when LOAD segments overlap

Tue Oct 8 01:43:31 PDT 2024

Hi Kazu,

Thank you for your detailed questions.

On 9/30/24 16:18, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Ming,
> 
> sorry for the late response to this patch.
> 
> On 2024/08/14 11:43, Ming Wang wrote:
>> When iterating through LOAD segments to exclude pages in
>> `exclude_nodata_pages`, simply using `phys_end` to track allocated
>> PFNs can lead to the erroneous exclusion of valid pages from a
>> subsequent LOAD segment if its `phys_start` overlaps with the
>> previous segment's address range.
>>
>> This patch addresses the issue by checking for such overlaps with the
>> next LOAD segment's physical address range. If an overlap is
>> detected, we continue including PFNs from the current segment until
>> reaching the end of the overlapping region.
>>
>> This fix ensures that all valid pages within overlapping LOAD segments
>> are correctly included during PFN exclusion.
> 
> I have a few questions:
> 
> - I heard from Masa that the overlap is caused by a bug, isn't it fixed 
> on the kernel side?  or it will be fixed but is this patch for existing 
> kernels that have the bug?
The overlap you mentioned is not caused by a kernel bug and is not expected to be fixed in the kernel.
> 
> - exclude_nodata_pages() is for dumpfiles created by "makedumpfile -E", 
> which uses p_filesz in program headers to record existing pages after 
> exclusion.  The function excludes pages only between 
> phys_start+file_size and phys_end, so if p_filesz == p_memsz, it will 
> not exclude any pages.
My system's page size is 16K, Consider the following LOAD segment information

phys_start         phys_end       virt_start         virt_end
LOAD[ 0]           200000          2186200 9000000000200000 9000000002186200
LOAD[ 1]           200000          ee00000 9000000000200000 900000000ee00000

I added some print statements within the exclude_nodata_pages function:

while (pfn < pfn_end) {
    ERRMSG("\nfunc:%s: pfn: %#llx phys_start: %#llx phys_end: %#llx file_size: %lu pfn_end:%#llx\n"
        , __func__, pfn, phys_start, phys_end, (unsigned long)file_size, pfn_end);
    clear_bit_on_2nd_bitmap(pfn, cycle);
    ++pfn;
}

The following output was observed:

func:exclude_nodata_pages: pfn: 0x861 phys_start: 0x200000 phys_end: 0x2186200 file_size: 33055232 pfn_end:0x862

After this print statement, clear_bit_on_2nd_bitmap is called, effectively clearing the bitmap bit for pfn: 0x861.
However, pfn: 0x861 is a valid PFN that happens to store the kernel's init_uts_ns. This leads to an error when the
crash tool attempts to detect the kernel version:

read_diskdump: PAGE_EXCLUDED: paddr/pfn: 2185d98/861
crash: page excluded: kernel virtual address: 9000000002185d98  type: "init_uts_ns" 
> I would like to check how they are on your machine, so could I have a 
> "readelf -l /proc/vmcore" output?
The output of the command readelf -l /proc/vmcore executed on my system is provided below. 

It is important to note that this output corresponds to a different kernel version compared to the one used in the preceding example:

[root at localhost ~]# readelf -l /proc/vmcore

Elf file type is CORE (Core file)
Entry point 0x0
There are 8 program headers, starting at offset 64

Program Headers:
Type           Offset             VirtAddr           PhysAddr
FileSiz            MemSiz              Flags  Align
NOTE           0x0000000000004000 0x0000000000000000 0x0000000000000000
               0x0000000000004a54 0x0000000000004a54         0x4
LOAD           0x000000000000c000 0x9000000000200000 0x0000000000200000
               0x0000000002550400 0x0000000002550400  RWE    0x0
LOAD           0x0000000002560000 0x9000000000200000 0x0000000000200000
               0x000000000ec00000 0x000000000ec00000  RWE    0x0
LOAD           0x0000000011160000 0x9000000090400000 0x0000000090400000
               0x0000000011200000 0x0000000011200000  RWE    0x0
LOAD           0x0000000022360000 0x90000000f8e00000 0x00000000f8e00000
               0x00000000000c0000 0x00000000000c0000  RWE    0x0
LOAD           0x0000000022420000 0x90000000f8ed0000 0x00000000f8ed0000
               0x0000000004c70000 0x0000000004c70000  RWE    0x0
LOAD           0x0000000027090000 0x90000000fe100000 0x00000000fe100000
               0x0000000f81f00000 0x0000000f81f00000  RWE    0x0
LOAD           0x0000000fa8f90000 0x9000100080000000 0x0000100080000000
               0x0000001000000000 0x0000001000000000  RWE    0x0
> 
> - Do you mean that the bug always puts the two LOAD segments that have 
> the same phys_start in a row?
Yes, as demonstrated in the example above.

Thanks,
Ming