[PATCHv3] nvme: authentication error are always non-retryable

Tue Feb 27 03:04:01 PST 2024

On 2/27/24 11:09, Daniel Wagner wrote:
> On Tue, Feb 27, 2024 at 09:51:17AM +0100, Daniel Wagner wrote:
>> On Tue, Feb 27, 2024 at 08:54:32AM +0100, Hannes Reinecke wrote:
>>> Ouch.
>>>
>>> Does this help?
>>
>> Yes, this helps. I am running the whole test suite for all transport a
>> few times just a few times now. Just to make sure.
> 
> - rmda nvme/031
> 
> [ 2454.989813] kworker/0:35/22379 is trying to acquire lock:
> [ 2454.989813] ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: __flush_work+0x72/0x7d0
> [ 2454.989813]
>                 but task is already holding lock:
> [ 2454.989813] ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.001905]
>                 other info that might help us debug this:
> [ 2455.001905]  Possible unsafe locking scenario:
> 
> [ 2455.005799]        CPU0
> [ 2455.005799]        ----
> [ 2455.005799]   lock((wq_completion)nvmet-wq);
> [ 2455.005799]   lock((wq_completion)nvmet-wq);
> [ 2455.005799]
>                  *** DEADLOCK ***
> 
> [ 2455.005799]  May be due to missing lock nesting notation
> 
> [ 2455.005799] 3 locks held by kworker/0:35/22379:
> [ 2455.005799]  #0: ffff88810a68dd48 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.005799]  #1: ffff88812e997d68 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_scheduled_works+0x6d4/0xf80
> [ 2455.021848]  #2: ffffffff920c60c0 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x72/0x7d0
> [ 2455.021848]
>                 stack backtrace:
> [ 2455.021848] CPU: 0 PID: 22379 Comm: kworker/0:35 Tainted: G        W          6.8.0-rc3+ #39 3d0b6128d1ea3c6026a2c1de70ba6c7dc10623c3
> [ 2455.021848] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
> [ 2455.021848] Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
> [ 2455.021848] Call Trace:
> [ 2455.021848]  <TASK>
> [ 2455.021848]  dump_stack_lvl+0x5b/0x80
> [ 2455.021848]  __lock_acquire+0x65f2/0x7b00
> [ 2455.021848]  ? lock_release+0x25c/0xcd0
> [ 2455.021848]  lock_acquire+0x11c/0x3d0
> [ 2455.021848]  ? __flush_work+0x72/0x7d0
> [ 2455.021848]  ? lockdep_hardirqs_on_prepare+0x2b0/0x5f0
> [ 2455.021848]  ? __flush_work+0x72/0x7d0
> [ 2455.021848]  __flush_work+0x648/0x7d0
> [ 2455.021848]  ? __flush_work+0x72/0x7d0
> [ 2455.021848]  ? __flush_work+0x72/0x7d0
> [ 2455.021848]  ? __cfi_wq_barrier_func+0x10/0x10
> [ 2455.021848]  nvmet_ctrl_put+0x3b2/0x640 [nvmet 06f982ecc8920c7359ba51cc41092b5d91a725d5]
> [ 2455.021848]  nvmet_sq_destroy+0x2e7/0x350 [nvmet 06f982ecc8920c7359ba51cc41092b5d91a725d5]
> [ 2455.021848]  nvmet_rdma_free_queue+0x35/0x590 [nvmet_rdma d9fba27fd955e2c2575b3e184853a6daafaffc5c]
> [ 2455.021848]  ? do_raw_spin_unlock+0x116/0x890
> [ 2455.021848]  ? process_scheduled_works+0x6d4/0xf80
> [ 2455.021848]  nvmet_rdma_release_queue_work+0x43/0xa0 [nvmet_rdma d9fba27fd955e2c2575b3e184853a6daafaffc5c]
> [ 2455.021848]  process_scheduled_works+0x774/0xf80
> [ 2455.060121]  worker_thread+0x8c4/0xfc0
> [ 2455.061903]  ? __kthread_parkme+0x84/0x120
> [ 2455.063471]  kthread+0x25d/0x2e0
> [ 2455.065681]  ? __cfi_worker_thread+0x10/0x10
> [ 2455.065681]  ? __cfi_kthread+0x10/0x10
> [ 2455.065681]  ret_from_fork+0x41/0x70
> [ 2455.065681]  ? __cfi_kthread+0x10/0x10
> [ 2455.065681]  ret_from_fork_asm+0x1b/0x30
> [ 2455.065681]  </TASK>
> [ 2455.096534] nvmet: adding nsid 1 to subsystem blktests-subsystem-5
> 
> This looks familiar. I thought we have addressed it. Maybe I missing the
> fix since this is on 6.8-rc3
> 
No, looks like a different one.
Problem seems to be the 'flush_workqueue(nvmet_wq)' in
nvmet_rdma_remove_one().
When calling that we end up with this call chain:

flush_workqueue(nvmet_wq)
   rdma_release_queue_work()
     rdma_free_queue()
       sq_destroy()
         ctrl_put()
           ctrl_free()
             flush_work(async_event_work)

which deadlocks as we're already flushing the workqueue.
What would happen if we do _not_ call flush_workqueue() in
nvmet_rdma_remove_one(), but rather move it into nvmet_rdma_exit()?

Cheers,

Hannes