[BUG] vmcore-dmesg cant' read dmesg log from /proc/vmcore if log_buf is reallocated due to large number of CPUs

Vadim Lomovtsev Vadim.Lomovtsev at caviumnetworks.com
Fri Oct 26 06:18:59 PDT 2018


Hi Bhupesh,

On Fri, Oct 26, 2018 at 03:49:11PM +0530, Bhupesh Sharma wrote:
> 
> Hi Vadim,
> On Fri, Oct 26, 2018 at 3:41 PM Vadim Lomovtsev
> <Vadim.Lomovtsev at caviumnetworks.com> wrote:
> >
> > Hi Bhupesh,
> >
> > On Fri, Oct 26, 2018 at 12:25:17PM +0530, Bhupesh Sharma wrote:
> > >
> > > ease p
> > > before seiHi Vadim,
> > >
> > > On Thu, Oct 25, 2018 at 4:10 PM Vadim Lomovtsev
> > > <Vadim.Lomovtsev at caviumnetworks.com> wrote:
> > > >
> > > > Hello Bhupesh,
> > > >
> > > > On Thu, Oct 25, 2018 at 03:00:08AM +0530, Bhupesh Sharma wrote:
> > > > > External Email
> > > > >
> > > > > Hello Vadim,
> > > > >
> > > > > On Wed, Oct 24, 2018 at 6:23 PM Lomovtsev, Vadim
> > > > > <Vadim.Lomovtsev at cavium.com> wrote:
> > > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Following issue has been found for vmcore-dmesg app with latest release (94159bc3c264fa26395e56302072276a139d18af 2.0.18-rc1) of kexec-tools at CentOS 7.5 distro:
> > > > > >
> > > > > > While having systems with large number of CPUs (e.g. Cavium ThunderX2 has 224) the log_buf gets reallocated by memblock_virt_alloc() at the setup_log_buf routine (https://elixir.bootlin.com/linux/v4.16.18/source/kernel/printk/printk.c#L1108).
> > > > > >
> > > > > > Then while dumping vmcore the vmcore-dmesg can't find dmesg log at /proc/vmcore file and exits with following message:
> > > > > >   Failed to read log text of size 0 bytes: Bad address
> > > > > >
> > > > > > However it (vmcore-dmesg app) reads properly the log_buf symbol, it's address and eventually it's value from /proc/vmcore but fails to find dmesg data then.
> > > > > >
> > > > > > In the same time the makedumpfile is able to find and extract dmesg buffer from /proc/vmcore.
> > > > > > The makedumpfile comes with kexec-tools-2.0.15-13.el7_5.2.aarch64 package.
> > > > > >
> > > > > > The issue is not reproduced for systems with small number of CPUs and log_buf not reallocated to memblock section.
> > > > >
> > > > > Seems like you are hitting a known issue we saw on qualcomm amberwing
> > > > > platforms as well.
> > > > > I have sent a patch-series titled 'kexec-tools/arm64: Add support to
> > > > > read PHYS_OFFSET from vmcoreinfo inside '/proc/kcore' to this list
> > > > > just a few minutes back.
> > > > >
> > > > > I have Cc'ed you to the patchset as I think it might fix the issue for
> > > > > you.
> > > >
> > > > Got them, thank you.
> > > >
> > > > > Kindly try the patchset on your platform (cavium?) and let me
> > > > > know if this fixes the issue for you.
> > > >
> > > > Sure, I'd like to check them at my side, but..
> > > > I fall into merge conflicts while trying to apply them onto
> > > > https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/
> > > > master, kexec-tools 2.0.18-rc1 94159bc3c264fa26395e56302072276a139d18af
> > >
> > > Hmm.. that's strange as I rebased them on kexec-tools 2.0.18-rc1
> > > (94159bc3c264fa26395e56302072276a139d18af)
> > > before sending out the patchset.
> > >
> > > > Are there any specific branch/revision for them to be applied ?
> > > > (or it might be my mail server issues with formatting emails).
> > > >
> > >
> > > Can you please try picking them up from my public github tree instead?
> > > Here you can find the same:
> > > https://github.com/bhupesh-sharma/kexec-tools/tree/read-phys-offset-from-kcore-upstream-v1
> > >
> > > Please pick the top 2 commit from here.
> >
> > Applied them onto commit '94159bc kexec-tools 2.0.18-rc1'.
> >
> > Still having following error while saving dmesg by vmcore-dmesg:
> >
> > kdump: saving vmcore-dmesg.txt
> > Failed to read log text of size 0 bytes: Bad address
> > kdump: saving vmcore-dmesg.txt failed
> >
> > So far tried kernels 4.14.78, 4.16.18.
> 
> You would need kernel 4.19-rc5 or above as the same exposes VMCOREINFO
> as '/proc/kcore'.

So far with 4.19-rc6 (and updated kexec, vmcore-dmesg but having kdump scripts from CentOS)
the crashkernel can't found sysroot and thus it can't dump anything, so it timeouts and reboot system.

> If you are having issues while switching to newer kernel, please share
> the output(s) of following on your platform:
> 
> # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> -r`.img --reuse-cmdline -d
>

attached as kexec-start.log.xz

> and,
> 
> # readelf -l vmcore

[root at 2sgbt-53 vlomovts]# readelf -l vmcore
readelf: vmcore: Error: No such file
[root at 2sgbt-53 vlomovts]# uname -r
4.19.0-rc6+

> 
> and,
> 
> # cat /proc/iomem

attached as cat-proc-iomem.log.xz

WBR,
Vadim

> 
> And then I can suggest a hack, which you can try and test on your
> platform and then we can take it forward from there.
> 
> Thanks,
> Bhupesh
> 
> > >
> > > Thanks,
> > > Bhupesh
> > >
> > > >
> > > > >
> > > > > Thanks,
> > > > > Bhupesh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cat-proc-iomem.log.xz
Type: application/x-xz
Size: 2172 bytes
Desc: cat-proc-iomem.log.xz
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20181026/dc6f81a9/attachment.xz>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kexec-start.log.xz
Type: application/x-xz
Size: 2784 bytes
Desc: kexec-start.log.xz
URL: <http://lists.infradead.org/pipermail/kexec/attachments/20181026/dc6f81a9/attachment-0001.xz>


More information about the kexec mailing list