nvme-6.3: blktests nvme/003 null pointer for fc transport
Daniel Wagner
dwagner at suse.de
Wed Feb 1 00:10:03 PST 2023
On Tue, Jan 31, 2023 at 03:27:20PM -0800, James Smart wrote:
> On 1/30/2023 3:15 AM, Daniel Wagner wrote:
> > Hi,
> >
> > Hannes added support to blktest to enable the FC transport for it. And as result
> > blktests is able to crash the kernel. Haven't startet to look into it yet.
> >
> > run blktests nvme/003 at 2023-01-30 12:08:28
> > nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > BUG: kernel NULL pointer dereference, address: 0000000000000010
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> > Oops: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 3 PID: 14200 Comm: nvme Not tainted 6.2.0-rc4+ #1 c9afca57e89c2aa3234427c2cd6cd4dd2984ae8c
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
> > RIP: 0010:nvme_alloc_admin_tag_set+0x38/0x120 [nvme_core]
>
> bingo...
>
> The cleanup patch "nvme-fc: use the tagset alloc/free helpers" introduced
> it.
>
> See the fix patch "nvme-fc: Fix initialization order" about 10 days ago that
> corrects it.
Good call! This is problem is gone. Now I am running into the next problem:
run blktests nvme/003 at 2023-02-01 09:01:58
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
debugfs: Directory 'nvme0' with parent '/' already present!
nvme0: failed to create debugfs attr
nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
(NULL device *): {0:0} Association created
nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:2cd2c43b-a90a-45c1-a8cd-86b33ab273b5.
nvme nvme0: NVME-FC{0}: controller connect complete
nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:1} Association created
nvmet: creating nvm controller 2 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:2cd2c43b-a90a-45c1-a8cd-86b33ab273b5.
nvme nvme1: NVME-FC{1}: controller connect complete
nvme nvme1: NVME-FC{1}: new ctrl: NQN "blktests-subsystem-1"
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme1: rescanning namespaces.
(NULL device *): {0:0} Association deleted
(NULL device *): {0:0} Association freed
(NULL device *): Disconnect LS failed: No Association
nvme nvme1: NVME-FC{1}: io failed due to lldd error 6
nvme nvme1: NVME-FC{1}: transport association event: transport detected io error
nvme nvme1: NVME-FC{1}: resetting controller
debugfs: Directory 'nvme0' with parent '/' already present!
nvme0: failed to create debugfs attr
nvme nvme0: NVME-FC{0}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "nqn.2014-08.org.nvmexpress.discovery"
(NULL device *): {0:0} Association created
nvmet: connect request for invalid subsystem nqn.2014-08.org.nvmexpress.discovery!
nvme nvme0: Connect Invalid Data Parameter, subsysnqn "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16770)
nvme nvme0: NVME-FC{0}: reconnect failure
nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
nvme nvme1: NVME-FC{1}: create association : host wwpn 0x20001100aa000002 rport wwpn 0x20001100aa000001: NQN "blktests-subsystem-1"
(NULL device *): {0:2} Association created
nvmet: connect request for invalid subsystem blktests-subsystem-1!
nvme nvme1: Connect Invalid Data Parameter, subsysnqn "blktests-subsystem-1"
nvme nvme1: NVME-FC{1}: reset: Reconnect attempt failed (16770)
nvme nvme1: NVME-FC{1}: reconnect failure
nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
(NULL device *): {0:1} Association deleted
(NULL device *): {0:1} Association freed
(NULL device *): Disconnect LS failed: No Association
(NULL device *): {0:0} Association deleted
(NULL device *): {0:0} Association freed
(NULL device *): Disconnect LS failed: No Association
(NULL device *): {0:2} Association deleted
(NULL device *): {0:2} Association freed
===========================================
WARNING: possible recursive locking detected
6.2.0-rc4+ #1 Not tainted
--------------------------------------------
kworker/3:13/21736 is trying to acquire lock:
ffff93fe60e251e8 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: __flush_work+0x38/0xc0
but task is already holding lock:
ffffa693c1a4fe68 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: process_one_work+0x236/0x410
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock((work_completion)(&tport->ls_work));
lock((work_completion)(&tport->ls_work));
*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by kworker/3:13/21736:
#0: ffff93fe9882ff48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x210/0x410
#1: ffffa693c1a4fe68 ((work_completion)(&tport->ls_work)){+.+.}-{0:0}, at: process_one_work+0x236/0x410
stack backtrace:
CPU: 3 PID: 21736 Comm: kworker/3:13 Not tainted 6.2.0-rc4+ #1 c9afca57e89c2aa3234427c2cd6cd4dd2984ae8c
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Workqueue: nvmet-wq fcloop_tport_lsrqst_work [nvme_fcloop]
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x80
validate_chain+0x1e02/0x2730
? lockdep_unlock+0x51/0xc0
? lockdep_unlock+0x51/0xc0
? validate_chain+0x7c8/0x2730
? validate_chain+0x7c8/0x2730
? lock_release+0xf7/0x400
? __lock_acquire+0x8f7/0xbc0
? lock_is_held_type+0x8b/0x110
__lock_acquire+0x8f7/0xbc0
? lock_is_held_type+0x8b/0x110
lock_acquire+0xb0/0x180
? __flush_work+0x38/0xc0
? lock_is_held_type+0x8b/0x110
__flush_work+0x57/0xc0
? __flush_work+0x38/0xc0
? lockdep_hardirqs_on_prepare+0x16e/0x220
? _raw_spin_unlock_irqrestore+0x32/0x50
? __free_pages_ok+0x289/0x4a0
fcloop_targetport_delete+0x15/0xa0 [nvme_fcloop 2f0f07bf6746e7b4ec6dff8d226e919dac60e0f1]
nvmet_fc_tgtport_put+0x132/0x190 [nvmet_fc cec46c808c67bf2a4fefe925da2828f0b3f2334a]
nvmet_fc_disconnect_assoc_done+0x7c/0xa0 [nvmet_fc cec46c808c67bf2a4fefe925da2828f0b3f2334a]
fcloop_tport_lsrqst_work+0x6c/0xa0 [nvme_fcloop 2f0f07bf6746e7b4ec6dff8d226e919dac60e0f1]
process_one_work+0x26a/0x410
worker_thread+0x21d/0x490
? rcu_lock_release+0x20/0x20
kthread+0xe1/0x100
? kthread_blkcg+0x30/0x30
ret_from_fork+0x1f/0x30
</TASK>
nvme_fcloop: fcloop_exit: Failed deleting target port
nvme_fcloop: fcloop_exit: Failed deleting remote port
nvme_fcloop: fcloop_exit: Failed deleting target port
nvme_fcloop: fcloop_exit: Failed deleting remote port
nvme_fcloop: fcloop_exit: Failed deleting target port
nvme_fcloop: fcloop_exit: Failed deleting remote port
The last last to message never stop.
More information about the Linux-nvme
mailing list