[BUG] vmcore-dmesg cant' read dmesg log from /proc/vmcore if log_buf is reallocated due to large number of CPUs

Bhupesh Sharma bhsharma at redhat.com
Fri Oct 26 16:11:55 PDT 2018


Hi Vadim,

On Fri, Oct 26, 2018 at 6:49 PM Vadim Lomovtsev
<Vadim.Lomovtsev at caviumnetworks.com> wrote:
>
> Hi Bhupesh,
>
> On Fri, Oct 26, 2018 at 03:49:11PM +0530, Bhupesh Sharma wrote:
> >
> > Hi Vadim,
> > On Fri, Oct 26, 2018 at 3:41 PM Vadim Lomovtsev
> > <Vadim.Lomovtsev at caviumnetworks.com> wrote:
> > >
> > > Hi Bhupesh,
> > >
> > > On Fri, Oct 26, 2018 at 12:25:17PM +0530, Bhupesh Sharma wrote:
> > > >
> > > > ease p
> > > > before seiHi Vadim,
> > > >
> > > > On Thu, Oct 25, 2018 at 4:10 PM Vadim Lomovtsev
> > > > <Vadim.Lomovtsev at caviumnetworks.com> wrote:
> > > > >
> > > > > Hello Bhupesh,
> > > > >
> > > > > On Thu, Oct 25, 2018 at 03:00:08AM +0530, Bhupesh Sharma wrote:
> > > > > > External Email
> > > > > >
> > > > > > Hello Vadim,
> > > > > >
> > > > > > On Wed, Oct 24, 2018 at 6:23 PM Lomovtsev, Vadim
> > > > > > <Vadim.Lomovtsev at cavium.com> wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Following issue has been found for vmcore-dmesg app with latest release (94159bc3c264fa26395e56302072276a139d18af 2.0.18-rc1) of kexec-tools at CentOS 7.5 distro:
> > > > > > >
> > > > > > > While having systems with large number of CPUs (e.g. Cavium ThunderX2 has 224) the log_buf gets reallocated by memblock_virt_alloc() at the setup_log_buf routine (https://elixir.bootlin.com/linux/v4.16.18/source/kernel/printk/printk.c#L1108).
> > > > > > >
> > > > > > > Then while dumping vmcore the vmcore-dmesg can't find dmesg log at /proc/vmcore file and exits with following message:
> > > > > > >   Failed to read log text of size 0 bytes: Bad address
> > > > > > >
> > > > > > > However it (vmcore-dmesg app) reads properly the log_buf symbol, it's address and eventually it's value from /proc/vmcore but fails to find dmesg data then.
> > > > > > >
> > > > > > > In the same time the makedumpfile is able to find and extract dmesg buffer from /proc/vmcore.
> > > > > > > The makedumpfile comes with kexec-tools-2.0.15-13.el7_5.2.aarch64 package.
> > > > > > >
> > > > > > > The issue is not reproduced for systems with small number of CPUs and log_buf not reallocated to memblock section.
> > > > > >
> > > > > > Seems like you are hitting a known issue we saw on qualcomm amberwing
> > > > > > platforms as well.
> > > > > > I have sent a patch-series titled 'kexec-tools/arm64: Add support to
> > > > > > read PHYS_OFFSET from vmcoreinfo inside '/proc/kcore' to this list
> > > > > > just a few minutes back.
> > > > > >
> > > > > > I have Cc'ed you to the patchset as I think it might fix the issue for
> > > > > > you.
> > > > >
> > > > > Got them, thank you.
> > > > >
> > > > > > Kindly try the patchset on your platform (cavium?) and let me
> > > > > > know if this fixes the issue for you.
> > > > >
> > > > > Sure, I'd like to check them at my side, but..
> > > > > I fall into merge conflicts while trying to apply them onto
> > > > > https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git/
> > > > > master, kexec-tools 2.0.18-rc1 94159bc3c264fa26395e56302072276a139d18af
> > > >
> > > > Hmm.. that's strange as I rebased them on kexec-tools 2.0.18-rc1
> > > > (94159bc3c264fa26395e56302072276a139d18af)
> > > > before sending out the patchset.
> > > >
> > > > > Are there any specific branch/revision for them to be applied ?
> > > > > (or it might be my mail server issues with formatting emails).
> > > > >
> > > >
> > > > Can you please try picking them up from my public github tree instead?
> > > > Here you can find the same:
> > > > https://github.com/bhupesh-sharma/kexec-tools/tree/read-phys-offset-from-kcore-upstream-v1
> > > >
> > > > Please pick the top 2 commit from here.
> > >
> > > Applied them onto commit '94159bc kexec-tools 2.0.18-rc1'.
> > >
> > > Still having following error while saving dmesg by vmcore-dmesg:
> > >
> > > kdump: saving vmcore-dmesg.txt
> > > Failed to read log text of size 0 bytes: Bad address
> > > kdump: saving vmcore-dmesg.txt failed
> > >
> > > So far tried kernels 4.14.78, 4.16.18.
> >
> > You would need kernel 4.19-rc5 or above as the same exposes VMCOREINFO
> > as '/proc/kcore'.
>
> So far with 4.19-rc6 (and updated kexec, vmcore-dmesg but having kdump scripts from CentOS)
> the crashkernel can't found sysroot and thus it can't dump anything, so it timeouts and reboot system.
>
> > If you are having issues while switching to newer kernel, please share
> > the output(s) of following on your platform:
> >
> > # kexec -p /boot/vmlinuz-`uname -r` --initrd=/boot/initramfs-`uname
> > -r`.img --reuse-cmdline -d
> >
>
> attached as kexec-start.log.xz
>
> > and,
> >
> > # readelf -l vmcore
>
> [root at 2sgbt-53 vlomovts]# readelf -l vmcore
> readelf: vmcore: Error: No such file
> [root at 2sgbt-53 vlomovts]# uname -r
> 4.19.0-rc6+
>
> >
> > and,
> >
> > # cat /proc/iomem
>
> attached as cat-proc-iomem.log.xz

Just to confirm: these logs are after your apply my kexec-tools patches, right?
It looks likely that we are seeing differences in the value of
'phys_offset' on your platforms:

From, '/proc/iomem', we can see that phys_offset is 0x01400000:
01400000-ffedffff : System RAM

while the 'kexec -p -d' logs indicate that it is 0:
image_arm64_load: phys_offset:    0000000000000000

This tells me that the phys_offset value is not correctly calculated
in kexec-tools which should be fixed after my patches.

BTW , by '# readelf -l vmcore', I meant the 'vmcore' dump file you
have obtained via 'kexec'. It might be that you are saving it on some
different location (something /var/crash?). Can you please try sharing
the output of the same as well?

Regards,
Bhupesh

> >
> > And then I can suggest a hack, which you can try and test on your
> > platform and then we can take it forward from there.
> >
> > Thanks,
> > Bhupesh
> >
> > > >
> > > > Thanks,
> > > > Bhupesh
> > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Bhupesh



More information about the kexec mailing list