kexec reports "Cannot get kernel _text symbol address" on arm64 platform

Mon Aug 21 05:22:13 PDT 2023

> -----Original Message-----
> From: bhe at redhat.com <bhe at redhat.com>
> Sent: Monday, August 14, 2023 6:08 PM
> To: Pandey, Radhey Shyam <radhey.shyam.pandey at amd.com>
> Cc: piliu at redhat.com; kexec at lists.infradead.org; linux-
> kernel at vger.kernel.org; Sarangi, Anirudha <anirudha.sarangi at amd.com>
> Subject: Re: kexec reports "Cannot get kernel _text symbol address" on
> arm64 platform
> 
> On 08/11/23 at 01:27pm, Pandey, Radhey Shyam wrote:
> > > -----Original Message-----
> > > From: bhe at redhat.com <bhe at redhat.com>
> > > Sent: Wednesday, August 9, 2023 7:42 AM
> > > To: Pandey, Radhey Shyam <radhey.shyam.pandey at amd.com>;
> > > piliu at redhat.com
> > > Cc: kexec at lists.infradead.org; linux-kernel at vger.kernel.org
> > > Subject: Re: kexec reports "Cannot get kernel _text symbol address"
> > > on
> > > arm64 platform
> > >
> > > On 08/08/23 at 07:17pm, Pandey, Radhey Shyam wrote:
> > > > Hi,
> > > >
> > > > I am trying to bring up kdump on arm64 platform[1]. But I get
> > > > "Cannot get
> > > kernel _text symbol address".
> > > >
> > > > Is there some Dump-capture kernel config options that I am missing?
> > > >
> > > > FYI, copied below complete kexec debug log.
> > > >
> > > > [1]: https://www.xilinx.com/products/boards-and-kits/vck190.html
> > >
> > > Your description isn't clear. You saw the printing, then your kdump
> > > kernel loading succeeded or not?
> > >
> > > If no, have you tried applying Pingfan's patchset and still saw the issue?
> > >
> > > [PATCHv7 0/5] arm64: zboot support
> > > https://lore.kernel.org/all/20230803024152.11663-1-piliu@redhat.com/
> > > T/#u
> >
> > I was able to proceed further with loading with crash kernel on triggering
> system crash.
> > echo c > /proc/sysrq-trigger
> >
> > But when I copy /proc/vmcore it throws memory abort. Also I see size of
> /proc/vmcore really huge (18446603353488633856).
> > Any possible guess on what could be wrong?
> 
> I didn't reproduce this issue on a arm64 baremetal system with the latest
> kernel. From the log, It could be the iov_iter convertion patch which caused
> this. Can you revert below patch to see if it works?
> 
> 5d8de293c224 vmcore: convert copy_oldmem_page() to take an iov_iter

Revert of this commit resulted in lot of conflicts. So as a safer side I checkout
v5.18 kernel version before above commit. Still I see the same issue.

/ # ls -lrth /proc/vmcore 
-r--------    1 root     root       16.0E Aug 21 12:16 /proc/vmcore
/ # dmesg | grep -i 5.18
[    0.000000] Linux version 5.18.0-00001-g689fdf110e63-dirty (radheys at xhdradheys41) (aarch64-xilinx-linux-gcc.real (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0.20220819) #37 SMP Mon Aug 21 17:38:24 IST 2023
[    2.494578] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.18
[    2.514941] usb usb1: Manufacturer: Linux 5.18.0-00001-g689fdf110e63-dirty xhci-hcd
[    2.555265] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.18
[    2.575621] usb usb2: Manufacturer: Linux 5.18.0-00001-g689fdf110e63-dirty xhci-hcd
[    3.152182] usb 1-1: new high-speed USB device number 2 using xhci-hcd
/ # cp /proc/vmcore dump
[   86.204704] Unable to handle kernel level 3 address size fault at virtual address ffff800009b75000
[   86.213677] Mem abort info:
[   86.216464]   ESR = 0x96000003
[   86.219508]   EC = 0x25: DABT (current EL), IL = 32 bits
[   86.224812]   SET = 0, FnV = 0
[   86.227856]   EA = 0, S1PTW = 0
[   86.230989]   FSC = 0x03: level 3 address size fault
[   86.235945] Data abort info:
[   86.238819]   ISV = 0, ISS = 0x00000003
[   86.242646]   CM = 0, WnR = 0
[   86.245608] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000043049000
[   86.252304] [ffff800009b75000] pgd=1000000061ffe003, p4d=1000000061ffe003, pud=1000000061ffd003, pmd=1000000043c12003, pte=00687ffff8200703
[   86.264828] Internal error: Oops: 96000003 [#1] SMP
[   86.269696] Modules linked in:
[   86.272741] CPU: 1 PID: 298 Comm: cp Not tainted 5.18.0-00001-g689fdf110e63-dirty #37
[   86.280562] Hardware name: Xilinx Versal vck190 Eval board revA (DT)
[   86.286905] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   86.293857] pc : __arch_copy_to_user+0x180/0x240
[   86.298475] lr : copy_oldmem_page+0xa8/0x110
[   86.302738] sp : ffff80000af6bc50
[   86.306041] x29: ffff80000af6bc50 x28: ffff8000097de3b0 x27: ffff8000097de228
[   86.313170] x26: 0000000000000000 x25: ffff80000af6bd60 x24: 0000000000000000
[   86.320299] x23: ffff800009b75000 x22: 0000000000000001 x21: 0000ffffffa1e5a8
[   86.327427] x20: 0000000000000000 x19: 0000000000001000 x18: 0000000000000000
[   86.334555] x17: 0000000000000000 x16: 0000000000000000 x15: ffff800009b75000
[   86.341682] x14: ffff800009863568 x13: 0000000000000000 x12: ffff800008000000
[   86.348810] x11: 00007ffff8201000 x10: ffff800009b75fff x9 : 0000000000000000
[   86.355937] x8 : ffff800009b76000 x7 : 0400000000000001 x6 : 0000ffffffa1e5a8
[   86.363065] x5 : 0000ffffffa1f5a8 x4 : 0000000000000000 x3 : 0000ffffffffffff
[   86.370192] x2 : 0000000000000f80 x1 : ffff800009b75000 x0 : 0000ffffffa1e5a8
[   86.377320] Call trace:
[   86.379755]  __arch_copy_to_user+0x180/0x240
[   86.384018]  read_from_oldmem.part.0+0x160/0x1f4
[   86.388629]  read_vmcore+0xe4/0x214
[   86.392109]  proc_reg_read+0xb0/0x100
[   86.395763]  vfs_read+0x90/0x1dc
[   86.398981]  ksys_read+0x70/0x10c
[   86.402286]  __arm64_sys_read+0x20/0x30
[   86.406111]  invoke_syscall+0x54/0x124
[   86.409852]  el0_svc_common.constprop.0+0x44/0xec
[   86.414547]  do_el0_svc+0x70/0x90
[   86.417853]  el0_svc+0x50/0xa4
[   86.420899]  el0t_64_sync_handler+0x10c/0x140
[   86.425247]  el0t_64_sync+0x18c/0x190
[   86.428902] Code: d503201f d503201f d503201f d503201f (a8c12027) 
[   86.434984] ---[ end trace 0000000000000000 ]---
Segmentation fault

> 
> >
> >
> > [   80.733523] Starting crashdump kernel...
> > [   80.737435] Bye!
> > [    0.000000] Booting Linux on physical CPU 0x0000000001 [0x410fd083]
> > [    0.000000] Linux version 6.5.0-rc4-ge28001fb4e07
> (radheys at xhdradheys41) (aarch64-xilinx-linux-gcc.real (GCC) 12.2.0, GNU ld
> (GNU Binutils) 2.39.0.20220819) #23 SMP Fri Aug 11 16:25:34 IST 2023
> > <snip>
> >
> >
> >
> > xilinx-vck190-20232:/run/media/mmcblk0p1# cat /proc/meminfo | head
> > MemTotal:        2092876 kB
> > MemFree:         1219928 kB
> > MemAvailable:    1166004 kB
> > Buffers:              32 kB
> > Cached:           756952 kB
> > SwapCached:            0 kB
> > Active:             1480 kB
> > Inactive:          24164 kB
> > Active(anon):       1452 kB
> > Inactive(anon):    24160 kB
> > xilinx-vck190-20232:/run/media/mmcblk0p1# cp /proc/vmcore dump
> > [  975.284865] Unable to handle kernel level 3 address size fault at
> > virtual address ffff80008d7cf000 [  975.293871] Mem abort info:
> > [  975.296669]   ESR = 0x0000000096000003
> > [  975.300425]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [  975.305738]   SET = 0, FnV = 0
> > [  975.308788]   EA = 0, S1PTW = 0
> > [  975.311925]   FSC = 0x03: level 3 address size fault
> > [  975.316888] Data abort info:
> > [  975.319763]   ISV = 0, ISS = 0x00000003, ISS2 = 0x00000000
> > [  975.325245]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > [  975.330292]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [  975.335599] swapper pgtable: 4k pages, 48-bit VAs,
> > pgdp=000005016ef6b000 [  975.342297] [ffff80008d7cf000]
> > pgd=10000501eddfe003, p4d=10000501eddfe003, pud=10000501eddfd003,
> > pmd=100005017b695003, pte=00687fff84000703 [  975.354827] Internal
> error: Oops: 0000000096000003 [#4] SMP [  975.360392] Modules linked in:
> > 3  975.
> > 63440] CBPrUo:a d0c aPID: 664 Comm: cp Tainted: G      D            6.5.0-rc4-
> ge28001fb4e07 #23
> > [  975.372822] Hardware name: Xilinx Versal vck190 Eval board revA
> > (DT) [  975.379165] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT
> > -SSBS BTYPE=--) [  975.386119] pc : __memcpy+0x110/0x230 [
> > 975.389783] lr : _copy_to_iter+0x3d8/0x4d0 [  975.393874] sp :
> > ffff80008dc939a0 [  975.397178] x29: ffff80008dc939a0 x28:
> > ffff05013c1bea30 x27: 0000000000001000 [  975.404309] x26:
> > 0000000000001000 x25: 0000000000001000 x24: ffff80008d7cf000 [
> > 975.411440] x23: 0000040000000000 x22: ffff80008dc93ba0 x21:
> > 0000000000001000 [  975.418570] x20: ffff000000000000 x19:
> > 0000000000001000 x18: 0000000000000000 [  975.425699] x17:
> > 0000000000000000 x16: 0000000000000000 x15: 0140000000000000 [
> > 975.432829] x14: ffff8500a9919000 x13: 0040000000000001 x12:
> > 0000fffef6831000 [  975.439958] x11: ffff80008d9cf000 x10:
> > 0000000000000000 x9 : 0000000000000000 [  975.447088] x8 :
> > ffff80008d7d0000 x7 : ffff0501addfd358 x6 : 0400000000000001 [
> > 975.454217] x5 : ffff0501370e9000 x4 : ffff80008d7d0000 x3 :
> 0000000000000000 [  975.461346] x2 : 0000000000001000 x1 :
> ffff80008d7cf000 x0 : ffff0501370e8000 [  975.468476] Call trace:
> > [  975.470912]  __memcpy+0x110/0x230
> > [  975.474221]  copy_oldmem_page+0x70/0xac [  975.478050]
> > read_from_oldmem.part.0+0x120/0x188
> > [  975.482663]  read_vmcore+0x14c/0x238 [  975.486231]
> > proc_reg_read_iter+0x84/0xd8 [  975.490233]
> > copy_splice_read+0x160/0x288 [  975.494236]
> > vfs_splice_read+0xac/0x10c [  975.498063]
> > splice_direct_to_actor+0xa4/0x26c [  975.502498]
> > do_splice_direct+0x90/0xdc [  975.506325]  do_sendfile+0x344/0x454 [
> > 975.509892]  __arm64_sys_sendfile64+0x134/0x140
> > [  975.514415]  invoke_syscall+0x54/0x124 [  975.518157]
> > el0_svc_common.constprop.0+0xc4/0xe4
> > [  975.522854]  do_el0_svc+0x38/0x98
> > [  975.526162]  el0_svc+0x2c/0x84
> > [  975.529211]  el0t_64_sync_handler+0x100/0x12c [  975.533562]
> > el0t_64_sync+0x190/0x194 [  975.537218] Code: cb01000e b4fffc2e
> > eb0201df 540004a3 (a940342c) [  975.543302] ---[ end trace
> > 0000000000000000 ]--- t message from
> > systemd-journald at xilinx-vck190-20232 (Tue 2022-11-08 14:16:20 UTC):
> >
> > kernel[539]: [  975.354827] Internal error: Oops: 0000000096000003
> > [#4] SMP
> >
> >
> > Broadcast message from systemd-journald at xilinx-vck190-20232 (Tue
> 2022-11-08 14:16:20 UTC):
> >
> > kernel[539]: [  975.537218] Code: cb01000e b4fffc2e eb0201df 540004a3
> > (a940342c)
> >
> > Segmentation fault
> > xilinx-vck190-20232:/run/media/mmcblk0p1# ls -lrth /proc/vmcore
> > -r--------    1 root     root       16.0E Nov  8 14:05 /proc/vmcore
> > xilinx-vck190-20232:/run/media/mmcblk0p1# ls -lh /proc/vmcore
> > -r--------    1 root     root       16.0E Nov  8 14:05 /proc/vmcore
> > xilinx-vck190-20232:/run/media/mmcblk0p1# ls -l /proc/vmcore
> > -r--------    1 root     root     18446603353488633856 Nov  8 14:05
> /proc/vmcore
> >
> > Thanks,
> > Radhey
> >