Implementing .shutdown method for efa module
Tao Liu
ltao at redhat.com
Fri Mar 29 04:58:02 PDT 2024
Hi Michael,
Sorry for the late reply. We spent some time to reproduce the issue on
the upstream kernel 6.9.0-rc1. I added our QE(libhe and xiliang) in
the CC list, who helped perform the test using their testing program.
On Tue, Mar 26, 2024 at 8:35 PM Margolin, Michael <mrgolin at amazon.com> wrote:
>
> Hi Tao,
>
> Thanks for bringing this up.
>
> I've unsuccessfully tried to reproduce this kernel panic using
> production Red Hat 9.3 AMI (5.14.0-362.18.1.el9_3.aarch64).
>
> Are there any related changes in the kernel you are testing?
We got the issue reproduced on upstream kernel 6.9.0-rc1, please see
the dmesg log as follows, however there is no suspicious "IRQ 191: no
longer affine to CPU7" string in "psci: CPUXX killed":
[ 5.722007] systemd[1]: modprobe at fuse.service: Deactivated successfully.
[ 5.722903] systemd[1]: Finished Load Kernel Module fuse.
[ 5.723705] systemd[1]: modprobe at drm.service: Deactivated successfully.
[ 5.724421] systemd[1]: Finished Load Kernel Module drm.
[ 5.725366] systemd[1]: Finished Read and set NIS domainname from
/etc/sysconfig/network.
[ 5.726556] systemd[1]: Finished Load Kernel Modules.
[ 5.727338] systemd[1]: Finished Generate network units from Kernel
command line.
[ 5.728215] systemd[1]: Started Journal Service.
[ 5.741469] systemd-journald[1198]: Received client request to
flush runtime journal.
[ 6.144313] efa 0000:00:1b.0: enabling device (0010 -> 0012)
[ 6.312684] efa 0000:00:1b.0: Setup irq:191 name:efa-mgmnt at pci:0000:00:1b.0
[ 6.335149] efa 0000:00:1b.0 efa_0: IB device registered
[ 6.360319] XFS (nvme3n1p2): Mounting V5 Filesystem
d7003ecc-db6f-4bfb-bf92-60376b6a6563
[ 6.386816] XFS (nvme3n1p2): Ending clean mount
[ 10.229126] block nvme3n1: the capability attribute has been deprecated.
[ 10.557952] PEFILE: Unsigned PE binary
Red Hat Enterprise Linux 9.4 Beta (Plow)
Kernel 6.9.0-rc1 on an aarch64
ip-10-0-20-120 login: [ 15.349910] PEFILE: Unsigned PE binary
[ 19.601416] kexec_core: Starting new kernel
[ 19.700609] psci: CPU1 killed (polled 0 ms)
[ 19.750454] psci: CPU2 killed (polled 0 ms)
[ 19.800416] psci: CPU3 killed (polled 0 ms)
[ 19.870431] psci: CPU4 killed (polled 0 ms)
[ 19.930427] psci: CPU5 killed (polled 0 ms)
[ 20.000415] psci: CPU6 killed (polled 0 ms)
[ 20.060417] psci: CPU7 killed (polled 0 ms)
[ 20.150404] psci: CPU8 killed (polled 0 ms)
[ 20.240416] psci: CPU9 killed (polled 0 ms)
[ 20.310424] psci: CPU10 killed (polled 0 ms)
[ 20.380418] psci: CPU11 killed (polled 0 ms)
[ 20.440418] psci: CPU12 killed (polled 0 ms)
[ 20.510406] psci: CPU13 killed (polled 0 ms)
[ 20.570404] psci: CPU14 killed (polled 0 ms)
[ 20.670406] psci: CPU15 killed (polled 0 ms)
[ 20.730487] psci: CPU16 killed (polled 0 ms)
[ 20.790421] psci: CPU17 killed (polled 0 ms)
[ 20.890428] psci: CPU18 killed (polled 0 ms)
[ 20.940423] psci: CPU19 killed (polled 0 ms)
[ 20.990427] psci: CPU20 killed (polled 0 ms)
[ 21.040426] psci: CPU21 killed (polled 0 ms)
[ 21.090423] psci: CPU22 killed (polled 0 ms)
[ 21.140406] psci: CPU23 killed (polled 0 ms)
[ 21.210414] psci: CPU24 killed (polled 0 ms)
[ 21.260407] psci: CPU25 killed (polled 0 ms)
[ 21.320410] psci: CPU26 killed (polled 0 ms)
[ 21.380412] psci: CPU27 killed (polled 0 ms)
[ 21.430408] psci: CPU28 killed (polled 0 ms)
[ 21.490407] psci: CPU29 killed (polled 0 ms)
[ 21.540396] psci: CPU30 killed (polled 0 ms)
[ 21.590385] psci: CPU31 killed (polled 0 ms)
[ 21.640416] psci: CPU32 killed (polled 0 ms)
[ 21.700411] psci: CPU33 killed (polled 0 ms)
[ 21.750420] psci: CPU34 killed (polled 0 ms)
[ 21.800408] psci: CPU35 killed (polled 0 ms)
[ 21.850417] psci: CPU36 killed (polled 0 ms)
[ 21.900411] psci: CPU37 killed (polled 0 ms)
[ 21.960400] psci: CPU38 killed (polled 0 ms)
[ 22.010399] psci: CPU39 killed (polled 0 ms)
[ 22.060401] psci: CPU40 killed (polled 0 ms)
[ 22.110393] psci: CPU41 killed (polled 0 ms)
[ 22.160398] psci: CPU42 killed (polled 0 ms)
[ 22.210407] psci: CPU43 killed (polled 0 ms)
[ 22.260392] psci: CPU44 killed (polled 0 ms)
[ 22.320386] psci: CPU45 killed (polled 0 ms)
[ 22.370388] psci: CPU46 killed (polled 0 ms)
[ 22.420815] psci: CPU47 killed (polled 0 ms)
[ 22.470402] psci: CPU48 killed (polled 0 ms)
[ 22.530398] psci: CPU49 killed (polled 0 ms)
[ 22.600393] psci: CPU50 killed (polled 0 ms)
[ 22.650395] psci: CPU51 killed (polled 0 ms)
[ 22.700393] psci: CPU52 killed (polled 0 ms)
[ 22.750405] psci: CPU53 killed (polled 0 ms)
[ 22.800388] psci: CPU54 killed (polled 0 ms)
[ 22.850397] psci: CPU55 killed (polled 0 ms)
[ 22.900396] psci: CPU56 killed (polled 0 ms)
[ 22.960392] psci: CPU57 killed (polled 0 ms)
[ 23.010412] psci: CPU58 killed (polled 0 ms)
[ 23.060426] psci: CPU59 killed (polled 0 ms)
[ 23.110424] psci: CPU60 killed (polled 0 ms)
[ 23.160433] psci: CPU61 killed (polled 0 ms)
[ 23.210450] psci: CPU62 killed (polled 0 ms)
[ 23.260485] psci: CPU63 killed (polled 0 ms)
[ 23.261213] Bye!
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x413fd0c1]
[ 0.000000] Linux version 6.9.0-rc1
(ec2-user at ip-10-0-31-71.us-west-2.compute.internal) (gcc (GCC) 11.4.1
20231218 (Red Hat 11.4.1-3), GNU ld version 2.35.2-43.el9) #1 SMP
PREEMPT_DYNAMIC Wed Mar 27 11:54:07 UTC 2024
[ 0.000000] KASLR enabled
[ 0.000000] random: crng init done
[ 0.000000] efi: EFI v2.7 by EDK II
[ 0.000000] efi: SMBIOS=0x7bed0000 SMBIOS 3.0=0x7beb0000
ACPI=0x786e0000 ACPI 2.0=0x786e0014 MEMATTR=0x7a759a98 RNG=0x70ea0018
MEMRESERVE=0x7857a918
...snip...
[ 5.747760] SELinux: policy capability cgroup_seclabel=1
[ 5.748618] SELinux: policy capability nnp_nosuid_transition=1
[ 5.749549] SELinux: policy capability genfs_seclabel_symlinks=1
[ 5.750507] SELinux: policy capability ioctl_skip_cloexec=0
[ 5.751397] SELinux: policy capability userspace_initial_context=0
[ 5.755680] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000018
[ 5.757055] Mem abort info:
[ 5.757505] ESR = 0x0000000096000004
[ 5.758125] EC = 0x25: DABT (current EL), IL = 32 bits
[ 5.758966] SET = 0, FnV = 0
[ 5.759457] EA = 0, S1PTW = 0
[ 5.759964] FSC = 0x04: level 0 translation fault
[ 5.760738] Data abort info:
[ 5.761197] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 5.762062] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 5.762866] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 5.763709] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000406d43000
[ 5.764717] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000
[ 5.765778] Internal error: Oops: 0000000096000004 [#1] SMP
[ 5.766659] Modules linked in: xfs(E) libcrc32c(E) nvme_tcp(E)
nvme_fabrics(E) crct10dif_ce(E) ghash_ce(E) sha2_ce(E) sha256_arm64(E)
sha1_ce(E) nvme(E) nvme_core(E) ena(E) sunrpc(E) dm_mirror(E)
dm_region_hash(E) dm_log(E) dm_mod(E) be2iscsi(E) cxgb4i(E) cxgb4(E)
tls(E) libcxgbi(E) libcxgb(E) qla4xxx(E) iscsi_boot_sysfs(E)
iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E)
fuse(E)
[ 5.772170] CPU: 7 PID: 1 Comm: systemd Tainted: G E
6.9.0-rc1 #1
[ 5.773380] Hardware name: Amazon EC2 i4g.16xlarge/, BIOS 1.0 11/1/2018
[ 5.774451] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5.775595] pc : __dentry_path+0xa8/0x1c8
[ 5.776266] lr : __dentry_path+0x8c/0x1c8
[ 5.776929] sp : ffff80008021b8a0
[ 5.777482] x29: ffff80008021b8a0 x28: ffff0003cc8f7ffe x27: ffff0003cc8f7fff
[ 5.778640] x26: ffffba330fc39000 x25: ffffba330fc39604 x24: ffff80008021b908
[ 5.779799] x23: ffff0003cbff1140 x22: 000000000000002f x21: 0000000000000366
[ 5.780971] x20: 0000000000000000 x19: 0000000000000ffe x18: ffffffffffffffff
[ 5.782130] x17: 00000000ada0e07b x16: 00000000892a1b2a x15: 0000000000000010
[ 5.783295] x14: 5d160d0000000000 x13: 0000000000000000 x12: 000000000000002c
[ 5.784467] x11: 0101010101010101 x10: 5d160d0000000000 x9 : ffffba330df0ee88
[ 5.785623] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 5.786788] x5 : fffffdffcf323dc0 x4 : 0000000000000000 x3 : 0000000000000000
[ 5.787950] x2 : ffff0003c1451480 x1 : 0000000000000000 x0 : 0000000000000000
[ 5.789115] Call trace:
[ 5.789526] __dentry_path+0xa8/0x1c8
[ 5.790141] dentry_path_raw+0x50/0x90
[ 5.790763] inode_doinit_with_dentry+0x310/0x520
[ 5.791543] sb_finish_set_opts+0x13c/0x358
[ 5.792233] selinux_set_mnt_opts+0x410/0x658
[ 5.792957] delayed_superblock_init+0x20/0x30
[ 5.793686] iterate_supers+0xa0/0x140
[ 5.794316] selinux_complete_init+0x28/0x88
[ 5.795014] selinux_policy_commit+0x2ac/0x2d0
[ 5.795748] sel_write_load+0x130/0x280
[ 5.796386] vfs_write+0xd8/0x360
[ 5.796948] ksys_write+0x70/0x108
[ 5.797512] __arm64_sys_write+0x20/0x30
[ 5.798172] invoke_syscall.constprop.0+0x7c/0xd0
[ 5.798943] do_el0_svc+0x4c/0xd0
[ 5.799493] el0_svc+0x44/0x1d8
[ 5.800024] el0t_64_sync_handler+0x134/0x150
[ 5.800757] el0t_64_sync+0x17c/0x180
[ 5.801367] Code: 540006cd 381ff376 aa1403e0 51000673 (f9400e94)
[ 5.802354] ---[ end trace 0000000000000000 ]---
[ 5.803107] Kernel panic - not syncing: Oops: Fatal exception
[ 5.804042] SMP: stopping secondary CPUs
[ 5.804726] Kernel Offset: 0x3a328dc00000 from 0xffff800080000000
[ 5.805713] PHYS_OFFSET: 0x40000000
[ 5.806288] CPU features: 0x0,0000080b,80100528,42417a0b
[ 5.807149] Memory Limit: none
[ 5.807668] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>
> Anyways we do need to handle shutdown properly, please let know if
> calling to efa_remove solves your issue.
We also tested the same kernel code, with " .shutdown = efa_remove,"
change in efa_main.c, the issue doesn't get reproduced. It looks to me
the issue can be solved by this code change, however according to
Jason, there might be problems. I'm not familiar with this, could you
please try to fix this issue? Thanks in advance!
Thanks,
Tao Liu
>
> Michael
>
> On 3/26/2024 3:38 AM, Tao Liu wrote:
> > Hi Gal,
> >
> > On Mon, Mar 25, 2024 at 4:06 PM Gal Pressman <gal.pressman at linux.dev> wrote:
> >> On 25/03/2024 4:10, Tao Liu wrote:
> >>> Hi,
> >>>
> >>> Recently I experienced a kernel panic which is related to efa module
> >>> when testing kexec -l && kexec -e to switch to a new kernel on AWS
> >>> i4g.16xlarge instance.
> >>>
> >>> Here is the dmesg log:
> >>>
> >>> [ 6.379918] systemd[1]: Mounting FUSE Control File System...
> >>> [ 6.381984] systemd[1]: Mounting Kernel Configuration File System...
> >>> [ 6.383918] systemd[1]: Starting Apply Kernel Variables...
> >>> [ 6.385430] systemd[1]: Started Journal Service.
> >>> [ 6.394221] ACPI: bus type drm_connector registered
> >>> [ 6.421408] systemd-journald[1263]: Received client request to
> >>> flush runtime journal.
> >>> [ 7.262543] efa 0000:00:1b.0: enabling device (0010 -> 0012)
> >>> [ 7.432420] efa 0000:00:1b.0: Setup irq:191 name:efa-mgmnt at pci:0000:00:1b.0
> >>> [ 7.435581] efa 0000:00:1b.0 efa_0: IB device registered
> >>> [ 7.885564] random: crng init done
> >>> [ 8.139857] XFS (nvme0n1p2): Mounting V5 Filesystem
> >>> d7003ecc-db6f-4bfb-bf92-60376b6a6563
> >>> [ 8.265233] XFS (nvme0n1p2): Ending clean mount
> >>> [ 10.555612] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>>
> >>> Red Hat Enterprise Linux 9.4 Beta (Plow)
> >>> Kernel 5.14.0-425.el9.aarch64 on an aarch64
> >>>
> >>> ip-10-0-27-226 login: [ 29.940381] kexec_core: Starting new kernel
> >>> [ 30.079279] psci: CPU1 killed (polled 0 ms)
> >>> [ 30.119222] psci: CPU2 killed (polled 0 ms)
> >>> [ 30.199293] psci: CPU3 killed (polled 0 ms)
> >>> [ 30.309214] psci: CPU4 killed (polled 0 ms)
> >>> [ 30.379221] psci: CPU5 killed (polled 0 ms)
> >>> [ 30.419210] psci: CPU6 killed (polled 0 ms)
> >>> [ 30.489207] IRQ 191: no longer affine to CPU7
> >>> [ 30.489667] psci: CPU7 killed (polled 0 ms)
> >>> ..snip...
> >>> [ 33.849123] psci: CPU63 killed (polled 0 ms)
> >>> [ 33.849943] Bye!
> >>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x413fd0c1]
> >>> [ 0.000000] Linux version 5.14.0-417.el9.aarch64
> >>> (mockbuild at arm64-025.build.eng.bos.redhat.com) (gcc (GCC) 11.4.1
> >>> 20231218 (Red Hat 11.4.1-3), GNU ld version 2.35.2-42.el9) #1 SMP
> >>> PREEMPT_DYNAMIC Thu Feb 1 21:23:03 EST 2024
> >>> ...snip...
> >>> [ 1.012692] Freeing unused kernel memory: 6016K
> >>> [ 2.370947] Checked W+X mappings: passed, no W+X pages found
> >>> [ 2.370980] Run /init as init process
> >>> [ 2.370982] with arguments:
> >>> [ 2.370983] /init
> >>> [ 2.370984] with environment:
> >>> [ 2.370984] HOME=/
> >>> [ 2.370985] TERM=linux
> >>> [ 2.373257] Kernel panic - not syncing: Attempted to kill init!
> >>> exitcode=0x0000000b
> >>> [ 2.373259] CPU: 1 PID: 1 Comm: init Not tainted 5.14.0-417.el9.aarch64 #1
> >>> [ 2.382240] Hardware name: Amazon EC2 i4g.16xlarge/, BIOS 1.0 11/1/2018
> >>> [ 2.383814] Call trace:
> >>> [ 2.384410] dump_backtrace+0xa8/0x120
> >>> [ 2.385318] show_stack+0x1c/0x30
> >>> [ 2.386124] dump_stack_lvl+0x74/0x8c
> >>> [ 2.387011] dump_stack+0x14/0x24
> >>> [ 2.387810] panic+0x158/0x368
> >>> [ 2.388553] do_exit+0x3a8/0x3b0
> >>> [ 2.389333] do_group_exit+0x38/0xa4
> >>> [ 2.390195] get_signal+0x7a4/0x810
> >>> [ 2.391044] do_signal+0x1bc/0x260
> >>> [ 2.391870] do_notify_resume+0x108/0x210
> >>> [ 2.392839] el0_da+0x154/0x160
> >>> [ 2.393603] el0t_64_sync_handler+0xdc/0x150
> >>> [ 2.394628] el0t_64_sync+0x17c/0x180
> >>> [ 2.395513] SMP: stopping secondary CPUs
> >>> [ 2.396483] Kernel Offset: 0x586f04e00000 from 0xffff800008000000
> >>> [ 2.397934] PHYS_OFFSET: 0x40000000
> >>> [ 2.398774] CPU features: 0x0,00000101,70020143,10417a0b
> >>> [ 2.400042] Memory Limit: none
> >>> [ 2.400783] ---[ end Kernel panic - not syncing: Attempted to kill
> >>> init! exitcode=0x0000000b ]---
> >>>
> >>> In the dmesg log, I found "[ 30.489207] IRQ 191: no longer affine to
> >>> CPU7" is suspicious, which is related to efa module. After blacklist
> >>> efa module from automatic loading when bootup, the kernel panic issue
> >>> doesn't appear again.
> >>>
> >>> It looks to me it is due to the efa being not properly shutdown during
> >>> kexec, so the ongoing DMA/interrupts etc overwrite the memory range.
> >>>
> >>> Though the issue is reproduced on rhel's kernel, the upstream kernel
> >>> [1] doesn't have the .shutdown method implemented either. Since I'm
> >>> not very familiar with the efa driver, could you please implement the
> >>> .shutdown method in drivers/infiniband/hw/efa/efa_main.c? Thanks in
> >>> advance!
> >> Did you try to reproduce it on upstream kernel?
> >>
> > Thanks for your comments! No I haven't, I will give it a try.
> >>> [1]: https://github.com/torvalds/linux/blob/master/drivers/infiniband/hw/efa/efa_main.c#L674
> >>>
> >>> Thanks,
> >>> Tao Liu
> >>>
> >> Try assigning efa_remove as the shutdown callback:
> >> .shutdown = efa_remove,
> >>
> >> Does it fix it?
> > Thanks, I will also try the code, and I will post the testing results.
> >
> > Thanks,
> > Tao Liu
> >
> >
>
More information about the kexec
mailing list