[PATCHv3] nvme: authentication error are always non-retryable
Hannes Reinecke
hare at suse.de
Tue Feb 27 03:04:01 PST 2024
On 2/27/24 11:09, Daniel Wagner wrote:
> On Tue, Feb 27, 2024 at 09:51:17AM +0100, Daniel Wagner wrote:
>> On Tue, Feb 27, 2024 at 08:54:32AM +0100, Hannes Reinecke wrote:
>>> Ouch.
>>>
>>> Does this help?
>>
>> Yes, this helps. I am running the whole test suite for all transport a
>> few times just a few times now. Just to make sure.
>
> - rmda nvme/031
>
> [ 2454.989813] kworker/0:35/22379 is trying to acquire lock:
> [ 2454.989813] ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: __flush_work+0x72/0x7d0
> [ 2454.989813]
> but task is already holding lock:
> [ 2454.989813] ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.001905]
> other info that might help us debug this:
> [ 2455.001905] Possible unsafe locking scenario:
>
> [ 2455.005799] CPU0
> [ 2455.005799] ----
> [ 2455.005799] lock((wq_completion)nvmet-wq);
> [ 2455.005799] lock((wq_completion)nvmet-wq);
> [ 2455.005799]
> *** DEADLOCK ***
>
> [ 2455.005799] May be due to missing lock nesting notation
>
> [ 2455.005799] 3 locks held by kworker/0:35/22379:
> [ 2455.005799] #0: ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.005799] #1: ffff88812e997d68 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.021848] #2: ffffffff920c60c0 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x7d0
> [ 2455.021848]
> stack backtrace:
> [ 2455.021848] CPU: 0 PID: 22379 Comm: kworker/0:35 Tainted: G W 6.8.0-rc3+ #39 3d0b6128d1ea3c6026a2c1de70ba6c7dc10623c3
> [ 2455.021848] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
> [ 2455.021848] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
> [ 2455.021848] Call Trace:
> [ 2455.021848] <TASK>
> [ 2455.021848] dump_stack_lvl+0x5b/0x80
> [ 2455.021848] __lock_acquire+0x65f2/0x7b00
> [ 2455.021848] ? lock_release+0x25c/0xcd0
> [ 2455.021848] lock_acquire+0x11c/0x3d0
> [ 2455.021848] ? __flush_work+0x72/0x7d0
> [ 2455.021848] ? lockdep_hardirqs_on_prepare+0x2b0/0x5f0
> [ 2455.021848] ? __flush_work+0x72/0x7d0
> [ 2455.021848] __flush_work+0x648/0x7d0
> [ 2455.021848] ? __flush_work+0x72/0x7d0
> [ 2455.021848] ? __flush_work+0x72/0x7d0
> [ 2455.021848] ? __cfi_wq_barrier_func+0x10/0x10
> [ 2455.021848] nvmet_ctrl_put+0x3b2/0x640 [nvmet 06f982ecc8920c7359ba51cc41092b5d91a725d5]
> [ 2455.021848] nvmet_sq_destroy+0x2e7/0x350 [nvmet 06f982ecc8920c7359ba51cc41092b5d91a725d5]
> [ 2455.021848] nvmet_rdma_free_queue+0x35/0x590 [nvmet_rdma d9fba27fd955e2c2575b3e184853a6daafaffc5c]
> [ 2455.021848] ? do_raw_spin_unlock+0x116/0x890
> [ 2455.021848] ? process_scheduled_works+0x6d4/0xf80
> [ 2455.021848] nvmet_rdma_release_queue_work+0x43/0xa0 [nvmet_rdma d9fba27fd955e2c2575b3e184853a6daafaffc5c]
> [ 2455.021848] process_scheduled_works+0x774/0xf80
> [ 2455.060121] worker_thread+0x8c4/0xfc0
> [ 2455.061903] ? __kthread_parkme+0x84/0x120
> [ 2455.063471] kthread+0x25d/0x2e0
> [ 2455.065681] ? __cfi_worker_thread+0x10/0x10
> [ 2455.065681] ? __cfi_kthread+0x10/0x10
> [ 2455.065681] ret_from_fork+0x41/0x70
> [ 2455.065681] ? __cfi_kthread+0x10/0x10
> [ 2455.065681] ret_from_fork_asm+0x1b/0x30
> [ 2455.065681] </TASK>
> [ 2455.096534] nvmet: adding nsid 1 to subsystem blktests-subsystem-5
>
> This looks familiar. I thought we have addressed it. Maybe I missing the
> fix since this is on 6.8-rc3
>
No, looks like a different one.
Problem seems to be the 'flush_workqueue(nvmet_wq)' in
nvmet_rdma_remove_one().
When calling that we end up with this call chain:
flush_workqueue(nvmet_wq)
rdma_release_queue_work()
rdma_free_queue()
sq_destroy()
ctrl_put()
ctrl_free()
flush_work(async_event_work)
which deadlocks as we're already flushing the workqueue.
What would happen if we do _not_ call flush_workqueue() in
nvmet_rdma_remove_one(), but rather move it into nvmet_rdma_exit()?
Cheers,
Hannes
More information about the Linux-nvme
mailing list