EXT: RE: crash: read error on type: "memory section root table"

Fri Jul 22 05:04:30 PDT 2022

Hello,

Back to this topic.

I upgraded our system with the kexec-tools from Centos 8 Stream, based on kexec 2.0.24 and makedumpfile 1.7.1.
We are still facing errors when using 'makedumpfile -c'.

Removing the '-c' gives better ratio success/failure, but sometimes the crash file cannot be read by the crash tool.

Referring to Hagio's remark below concerning the sync, I added a sync operation before the call of makedumpfile (and just after the mount ext4 of the required partitions) and add a second call to sync after the return of makedumpfile.
In that configuration, the crash file can be read by the crash tool (up to now in all cases).

Thanks for your help.
Best regards,
Patrick Agrain

-----Message d'origine-----
De : Crash-utility <crash-utility-bounces at redhat.com> De la part de Agrain Patrick
Envoyé : mercredi 6 avril 2022 17:48
À : Discussion list for crash utility usage, maintenance and development <crash-utility at redhat.com>; kexec at lists.infradead.org
Objet : Re: [Crash-utility] EXT: RE: crash: read error on type: "memory section root table"

-----Message d'origine-----
De : HAGIO KAZUHITO(萩尾　一仁) <k-hagio-ab at nec.com> Envoyé : mercredi 6 avril 2022 09:48 À : Agrain Patrick <patrick.agrain at al-enterprise.com>
Cc : Discussion list for crash utility usage, maintenance and development <crash-utility at redhat.com>; kexec at lists.infradead.org Objet : RE: EXT: RE: crash: read error on type: "memory section root table"

-----Original Message-----
> Hello,
> 
> Suggested trace above gives following information after a crash -d 8 command:
> <...>
> kernel NR_CPUS: 2
> <readmem: ffffffffa4925820, KVADDR, "high_memory", 8, (FOE),
> 56017b542648>
> <read_diskdump: addr: ffffffffa4925820 paddr: 12925820 cnt: 8>
> read_diskdump: paddr/pfn: 12925820/12925 -> cache physical page: 
> 12925000
> GETBUF(328 -> 0)
> FREEBUF(0)
> GETBUF(328 -> 0)
> FREEBUF(0)
> PAGESIZE=4096
> mem_section_size = 16384
> NR_SECTION_ROOTS = 2048
> NR_MEM_SECTIONS = 524288
> SECTIONS_PER_ROOT = 256
> SECTION_ROOT_MASK = 0xff
> PAGES_PER_SECTION = 32768
> <readmem: ffffffffa4926db0, KVADDR, "mem_section", 8, (FOE),
> 7ffd1b6bb000>
> <read_diskdump: addr: ffffffffa4926db0 paddr: 12926db0 cnt: 8>
> read_diskdump: paddr/pfn: 12926db0/12926 -> cache physical page: 
> 12926000
> <readmem: ffff904c7f7fc000, KVADDR, "memory section root table", 
> 16384, (FOE), 56017da26fd0>
> <read_diskdump: addr: ffff904c7f7fc000 paddr: 3f7fc000 cnt: 4096>
> read_diskdump: paddr/pfn: 3f7fc000/3f7fc -> cache physical page: 
> 3f7fc000
> crash: PAG3 - errno=2 r=0 pd.size=49
> read_diskdump: READ_ERROR: cannot cache page: 3f7fc000
> crash: read error: kernel virtual address: ffff904c7f7fc000  type: "memory section root table"

hmm, r=0 means end of file, can you check again whether pd.offset exceeds the dumpfile size?  If so, somehow the dumpfile is shorter than expected.

Indeed, the offset points outside the dumpfile:
Get:
crash: PAG3 - errno=2 r=0 pd.size=52 pd.offset=168956485 with a dumpfile
164820 -rw-r--r--.  1 root root  168775680  6 avril 17:23 crashdump--20220406-1713

And another one:
Get:
crash: PAG3 - errno=2 r=0 pd.size=49 pd.offset=215640649 with a dumpfile
209984 -rw-r--r--.  1 root root  215023616  1 avril 10:58 crashdump-585.000-20220401-1054

I think a RHEL-based kexec-tools does "sync" after makedumpfile, but can you check?

Actually, we are executing the makedumpfile in a script designated as init file for the second kernel. Therefore, we do not perform the sync as per core_collector.

Thanks,
Kazu

Best regards,
Patrick

--
Crash-utility mailing list
Crash-utility at redhat.com
https://listman.redhat.com/mailman/listinfo/crash-utility
Contribution Guidelines: https://github.com/crash-utility/crash/wiki