nvme-fc: waiting in invalid context bug

Wed Aug 23 04:59:39 PDT 2023

On Wed, Aug 23, 2023 at 01:43:00PM +0200, Daniel Wagner wrote:
> While I am working on fc support in blktest I run into bug report below.
> 
> If I read this correct ee6fdc5055e9 ("nvme-fc: fix race between error
> recovery and creating association") is introducing this bug.

Reverting this commit, makes the report go away.

>  =============================
>  [ BUG: Invalid wait context ]
>  6.5.0-rc2+ #16 Tainted: G        W
>  -----------------------------
>  kworker/u8:5/105 is trying to lock:
>  ffff8881127d4748 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_kick_requeue_lists+0x31/0x1d0 [nvme_core]
>  other info that might help us debug this:
>  context-{4:4}
>  3 locks held by kworker/u8:5/105:
>   #0: ffff8881182cd148 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x7a6/0x1180
>   #1: ffff888110fa7d20 ((work_completion)(&(&ctrl->connect_work)->work)){+.+.}-{0:0}, at: process_one_work+0x7e9/0x1180
>   #2: ffff8881127d4018 (&ctrl->lock#2){....}-{2:2}, at: nvme_fc_connect_ctrl_work+0x1715/0x1be0 [nvme_fc]
>  stack backtrace:
>  CPU: 1 PID: 105 Comm: kworker/u8:5 Tainted: G        W          6.5.0-rc2+ #16 4796ef1f1e7efc9e14ac22d8802d2575bb3a3aef
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
>  Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0x5b/0x80
>   __lock_acquire+0x17e8/0x7e70
>   ? mark_lock+0x94/0x350
>   ? verify_lock_unused+0x150/0x150
>   ? verify_lock_unused+0x150/0x150
>   ? lock_acquire+0x16d/0x410
>   ? process_one_work+0x1180/0x1180
>   ? lock_release+0x2aa/0xd30
>   ? __cfi_lock_release+0x10/0x10
>   ? start_flush_work+0x553/0x610
>   lock_acquire+0x16d/0x410
>   ? nvme_kick_requeue_lists+0x31/0x1d0 [nvme_core af1437cccf764f8f599077b8e0f169b94f7f9966]
>   ? __cfi_lock_acquire+0x10/0x10
>   ? __wake_up+0x120/0x200
>   ? lock_release+0x2aa/0xd30
>   ? nvme_kick_requeue_lists+0x31/0x1d0 [nvme_core af1437cccf764f8f599077b8e0f169b94f7f9966]
>   down_read+0xa7/0xa10
>   ? nvme_kick_requeue_lists+0x31/0x1d0 [nvme_core af1437cccf764f8f599077b8e0f169b94f7f9966]
>   ? try_to_grab_pending+0x86/0x480
>   ? __cfi_down_read+0x10/0x10
>   ? __cancel_work_timer+0x3a1/0x480
>   ? _raw_spin_unlock_irqrestore+0x24/0x50
>   ? cancel_work_sync+0x20/0x20
>   ? __cfi_lock_release+0x10/0x10
>   nvme_kick_requeue_lists+0x31/0x1d0 [nvme_core af1437cccf764f8f599077b8e0f169b94f7f9966]
>   nvme_change_ctrl_state+0x208/0x2e0 [nvme_core af1437cccf764f8f599077b8e0f169b94f7f9966]
>   nvme_fc_connect_ctrl_work+0x17a4/0x1be0 [nvme_fc 40247846cbe6ec64af4ae5bef38fd58d34ff3bbd]
>   process_one_work+0x89c/0x1180
>   ? rescuer_thread+0x1150/0x1150
>   ? do_raw_spin_trylock+0xc9/0x1f0
>   ? lock_acquired+0x310/0x9b0
>   ? worker_thread+0xd5e/0x1260
>   worker_thread+0x91e/0x1260
>   kthread+0x25d/0x2f0
>   ? __cfi_worker_thread+0x10/0x10
>   ? __cfi_kthread+0x10/0x10
>   ret_from_fork+0x41/0x70
>   ? __cfi_kthread+0x10/0x10
>   ret_from_fork_asm+0x1b/0x30
>  RIP: 0000:0x0
>  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>  RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
>  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
>