[PATCH] nvmet-rdma: Suppress a class of lockdep complaints
Shinichiro Kawasaki
shinichiro.kawasaki at wdc.com
Wed May 10 09:09:51 PDT 2023
On May 09, 2023 / 16:24, Sagi Grimberg wrote:
>
> > > Bart, thank you very much for this immediate action after the
> > > discussion at LSF.
> > > This is encouraging. I applied the patch on top of v6.4-rc1 and ran
> > > the test
> > > case with various transports. Unfortunately, I observed kernel
> > > panics with rdma
> > > and siw transports [1][2]. Also I observed another lockdep WARN with tcp
> > > transport [3]. It looks that your fix unveiled more hidden issue/s.
> >
> > Please use siw instead of rxe when running blktests - there are known
> > issues with the rxe driver.
> >
> > Please apply these patches on top of kernel v6.3 instead of v6.4-rc1.
> > The hrtimer_interrupt() crash shown below is a v6.4-rc1 regression and
> > does not occur with the v6.3 kernel.
> >
> > Since my patch is for the RDMA transport, it is not clear to me why a
> > report for the TCP transport is included in a reply to my patch?
>
> Agree,
Sorry for my misunderstandings. I've tested again with siw and kernel v6.3.
Still I see the kernel panic. Here's the kernel messages I've got.
[ 59.567730][ T935] rdma_rxe: loaded
[ 59.614648][ T915] run blktests nvme/003 at 2023-05-10 08:48:26
[ 59.714402][ T948] SoftiWARP attached
[ 59.801368][ T969] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 59.810654][ T970] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[ 59.813025][ T970] nvmet_rdma: enabling port 0 (10.0.2.15:4420)
[ 59.861254][ T61] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress..
[ 59.869998][ T971] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.2.150
[ 69.939259][ T982] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 69.963920][ C2] ------------[ cut here ]------------
[ 69.964368][ C2] DEBUG_LOCKS_WARN_ON(1)
[ 69.964382][ C2] WARNING: CPU: 2 PID: 825 at kernel/locking/lockdep.c:232 __lock_acquire+0x28a4/00
[ 69.965436][ C2] Modules linked in: siw rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel nvmet_rdma ng
[ 69.969849][ C2] CPU: 2 PID: 825 Comm: kworker/2:4 Not tainted 6.3.0+ #5
[ 69.970389][ C2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/4
[ 69.971100][ C2] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
[ 69.971671][ C2] RIP: 0010:__lock_acquire+0x28a4/0x5eb0
[ 69.972090][ C2] Code: 08 84 d2 0f 85 52 22 00 00 83 3d 42 c2 18 04 00 75 b4 48 c7 c6 20 17 ce 85b
[ 69.973525][ C2] RSP: 0018:ffff8883aef09ce8 EFLAGS: 00010092
[ 69.973980][ C2] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 1ffff11075de1370
[ 69.974565][ C2] RDX: 0000000000010003 RSI: 0000000000000004 RDI: 0000000000000001
[ 69.975155][ C2] RBP: ffff8881215bc108 R08: 0000000000000001 R09: ffff8883aef3084b
[ 69.975744][ C2] R10: ffffed1075de6109 R11: 0000000000000001 R12: 0000000000000002
[ 69.976331][ C2] R13: 0000000000000000 R14: ffffffff8755632c R15: 00000000ffffffff
[ 69.976922][ C2] FS: 0000000000000000(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
[ 69.977576][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 69.978063][ C2] CR2: 0000563bf892e000 CR3: 000000012eb74000 CR4: 00000000000006e0
[ 69.978651][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 69.979241][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 69.979830][ C2] Call Trace:
[ 69.980074][ C2] <IRQ>
[ 69.980290][ C2] ? lock_acquire+0x1b7/0x4e0
[ 69.980641][ C2] ? __pfx___lock_acquire+0x10/0x10
[ 69.981030][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 69.981402][ C2] ? update_process_times+0x158/0x1d0
[ 69.981804][ C2] ? __pfx___lock_acquire+0x10/0x10
[ 69.982192][ C2] lock_acquire+0x1a7/0x4e0
[ 69.982528][ C2] ? hrtimer_interrupt+0x100/0x810
[ 69.983794][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 69.985043][ C2] ? hrtimer_interrupt+0x339/0x810
[ 69.986262][ C2] ? kvm_clock_read+0x14/0x30
[ 69.987449][ C2] _raw_spin_lock_irqsave+0x47/0x70
[ 69.988673][ C2] ? hrtimer_interrupt+0x100/0x810
[ 69.989922][ C2] hrtimer_interrupt+0x100/0x810
[ 69.991111][ C2] ? __pfx_sched_clock_cpu+0x10/0x10
[ 69.992319][ C2] __sysvec_apic_timer_interrupt+0x146/0x3f0
[ 69.993581][ C2] sysvec_apic_timer_interrupt+0x8a/0xb0
[ 69.994853][ C2] </IRQ>
[ 69.995882][ C2] <TASK>
[ 69.996900][ C2] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 69.998122][ C2] RIP: 0010:lockdep_unregister_key+0x105/0x250
[ 69.999323][ C2] Code: 7c 08 84 d2 0f 85 29 01 00 00 8b 05 75 7e 19 04 85 c0 74 02 0f 0b e8 6a e7f
[ 70.002306][ C2] RSP: 0018:ffff88810ad67ca0 EFLAGS: 00000206
[ 70.003521][ C2] RAX: 0000000000000002 RBX: ffffffff897afb78 RCX: 0000000000000001
[ 70.004921][ C2] RDX: 0000000000000000 RSI: ffffffff85ce0fa0 RDI: ffffffff85fa7100
[ 70.006259][ C2] RBP: ffff88812da24ae0 R08: 0000000000000000 R09: ffff8883aef45c2f
[ 70.007590][ C2] R10: ffffed1075de8b85 R11: ffff8881215bb280 R12: 0000000000000000
[ 70.008924][ C2] R13: 0000000000000246 R14: ffffffff89939550 R15: ffff88812f5c3118
[ 70.010209][ C2] nvmet_rdma_free_queue+0x2e/0x390 [nvmet_rdma]
[ 70.011363][ C2] nvmet_rdma_release_queue_work+0x3e/0x90 [nvmet_rdma]
[ 70.012535][ C2] process_one_work+0x7e4/0x1390
[ 70.013568][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 70.014594][ C2] ? __pfx_process_one_work+0x10/0x10
[ 70.015661][ C2] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 70.016745][ C2] worker_thread+0xf7/0x12b0
[ 70.017730][ C2] ? __kthread_parkme+0xc1/0x1f0
[ 70.018687][ C2] ? __pfx_worker_thread+0x10/0x10
[ 70.019645][ C2] kthread+0x29e/0x340
[ 70.020517][ C2] ? __pfx_kthread+0x10/0x10
[ 70.021427][ C2] ret_from_fork+0x2c/0x50
[ 70.022323][ C2] </TASK>
[ 70.023111][ C2] irq event stamp: 3790
[ 70.023976][ C2] hardirqs last enabled at (3789): [<ffffffff85a00e86>] asm_sysvec_apic_timer_int0
[ 70.025311][ C2] hardirqs last disabled at (3790): [<ffffffff85907a9a>] sysvec_apic_timer_interru0
[ 70.026609][ C2] softirqs last enabled at (3158): [<ffffffff832483be>] __irq_exit_rcu+0xfe/0x260
[ 70.027887][ C2] softirqs last disabled at (3151): [<ffffffff832483be>] __irq_exit_rcu+0xfe/0x260
[ 70.029108][ C2] ---[ end trace 0000000000000000 ]---
[ 70.030101][ C2] general protection fault, probably for non-canonical address 0xdffffc0000000008:I
[ 70.031556][ C2] KASAN: null-ptr-deref in range [0x0000000000000040-0x0000000000000047]
[ 70.032794][ C2] CPU: 2 PID: 825 Comm: kworker/2:4 Tainted: G W 6.3.0+ #5
[ 70.034045][ C2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/4
[ 70.035360][ C2] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
[ 70.036559][ C2] RIP: 0010:__lock_acquire+0x2481/0x5eb0
[ 70.037617][ C2] Code: 0f 83 a7 03 00 00 48 8d 1c 5b 48 c1 e3 06 48 81 c3 e0 3f 7b 89 48 b8 00 00f
[ 70.040455][ C2] RSP: 0018:ffff8883aef09ce8 EFLAGS: 00010002
[ 70.041629][ C2] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff11075de1370
[ 70.042955][ C2] RDX: 0000000000000008 RSI: 0000000000000004 RDI: 0000000000000040
[ 70.044233][ C2] RBP: ffff8881215bc108 R08: 0000000000000001 R09: ffff8883aef3084b
[ 70.045519][ C2] R10: ffffed1075de6109 R11: ffff8881215bb280 R12: 0000000000000002
[ 70.046855][ C2] R13: 0000000000000000 R14: ffffffff8755632c R15: 00000000ffffffff
[ 70.048148][ C2] FS: 0000000000000000(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
[ 70.049519][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 70.050776][ C2] CR2: 0000563bf892e000 CR3: 000000012eb74000 CR4: 00000000000006e0
[ 70.052120][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 70.053428][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 70.054774][ C2] Call Trace:
[ 70.055795][ C2] <IRQ>
[ 70.056779][ C2] ? lock_acquire+0x1b7/0x4e0
[ 70.057890][ C2] ? __pfx___lock_acquire+0x10/0x10
[ 70.058980][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 70.060031][ C2] ? update_process_times+0x158/0x1d0
[ 70.061089][ C2] ? __pfx___lock_acquire+0x10/0x10
[ 70.062128][ C2] lock_acquire+0x1a7/0x4e0
[ 70.063114][ C2] ? hrtimer_interrupt+0x100/0x810
[ 70.064133][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 70.065138][ C2] ? hrtimer_interrupt+0x339/0x810
[ 70.066153][ C2] ? kvm_clock_read+0x14/0x30
[ 70.067127][ C2] _raw_spin_lock_irqsave+0x47/0x70
[ 70.068136][ C2] ? hrtimer_interrupt+0x100/0x810
[ 70.069135][ C2] hrtimer_interrupt+0x100/0x810
[ 70.070114][ C2] ? __pfx_sched_clock_cpu+0x10/0x10
[ 70.071129][ C2] __sysvec_apic_timer_interrupt+0x146/0x3f0
[ 70.072201][ C2] sysvec_apic_timer_interrupt+0x8a/0xb0
[ 70.073246][ C2] </IRQ>
[ 70.074097][ C2] <TASK>
[ 70.074941][ C2] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 70.076012][ C2] RIP: 0010:lockdep_unregister_key+0x105/0x250
[ 70.077089][ C2] Code: 7c 08 84 d2 0f 85 29 01 00 00 8b 05 75 7e 19 04 85 c0 74 02 0f 0b e8 6a e7f
[ 70.079842][ C2] RSP: 0018:ffff88810ad67ca0 EFLAGS: 00000206
[ 70.080983][ C2] RAX: 0000000000000002 RBX: ffffffff897afb78 RCX: 0000000000000001
[ 70.082242][ C2] RDX: 0000000000000000 RSI: ffffffff85ce0fa0 RDI: ffffffff85fa7100
[ 70.083498][ C2] RBP: ffff88812da24ae0 R08: 0000000000000000 R09: ffff8883aef45c2f
[ 70.084767][ C2] R10: ffffed1075de8b85 R11: ffff8881215bb280 R12: 0000000000000000
[ 70.086040][ C2] R13: 0000000000000246 R14: ffffffff89939550 R15: ffff88812f5c3118
[ 70.087312][ C2] nvmet_rdma_free_queue+0x2e/0x390 [nvmet_rdma]
[ 70.088457][ C2] nvmet_rdma_release_queue_work+0x3e/0x90 [nvmet_rdma]
[ 70.089629][ C2] process_one_work+0x7e4/0x1390
[ 70.090651][ C2] ? __pfx_lock_acquire+0x10/0x10
[ 70.091665][ C2] ? __pfx_process_one_work+0x10/0x10
[ 70.092698][ C2] ? __pfx_do_raw_spin_lock+0x10/0x10
[ 70.093717][ C2] worker_thread+0xf7/0x12b0
[ 70.094656][ C2] ? __kthread_parkme+0xc1/0x1f0
[ 70.095592][ C2] ? __pfx_worker_thread+0x10/0x10
[ 70.096533][ C2] kthread+0x29e/0x340
[ 70.097392][ C2] ? __pfx_kthread+0x10/0x10
[ 70.098286][ C2] ret_from_fork+0x2c/0x50
[ 70.099170][ C2] </TASK>
[ 70.099941][ C2] Modules linked in: siw rdma_rxe ib_uverbs ip6_udp_tunnel udp_tunnel nvmet_rdma ng
[ 70.107198][ C2] ---[ end trace 0000000000000000 ]---
[ 70.108252][ C2] RIP: 0010:__lock_acquire+0x2481/0x5eb0
[ 70.109319][ C2] Code: 0f 83 a7 03 00 00 48 8d 1c 5b 48 c1 e3 06 48 81 c3 e0 3f 7b 89 48 b8 00 00f
[ 70.112125][ C2] RSP: 0018:ffff8883aef09ce8 EFLAGS: 00010002
[ 70.113284][ C2] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff11075de1370
[ 70.114588][ C2] RDX: 0000000000000008 RSI: 0000000000000004 RDI: 0000000000000040
[ 70.115899][ C2] RBP: ffff8881215bc108 R08: 0000000000000001 R09: ffff8883aef3084b
[ 70.117205][ C2] R10: ffffed1075de6109 R11: ffff8881215bb280 R12: 0000000000000002
[ 70.118517][ C2] R13: 0000000000000000 R14: ffffffff8755632c R15: 00000000ffffffff
[ 70.119840][ C2] FS: 0000000000000000(0000) GS:ffff8883aef00000(0000) knlGS:0000000000000000
[ 70.121226][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 70.122449][ C2] CR2: 0000563bf892e000 CR3: 000000012eb74000 CR4: 00000000000006e0
[ 70.123791][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 70.125131][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 70.126462][ C2] Kernel panic - not syncing: Fatal exception in interrupt
[ 70.127873][ C2] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80)
[ 70.129488][ C2] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
More information about the Linux-nvme
mailing list