[bug report] "BUG: Invalid wait context" at blktests nvme/052

Keith Busch kbusch at kernel.org
Thu Oct 17 10:03:46 PDT 2024


On Thu, Oct 17, 2024 at 07:49:33AM +0000, Shinichiro Kawasaki wrote:
> I observed the failure of blktests test case nvme/052 with the kernel v6.12-rc3.
> The failure cause was the "BUG: Invalid wait context" [1]. I have no idea how to
> fix this. Help for fix will be appreciated.
> 
> Here I share my observations. The test case repeats namespace creation and
> removal for a nvme loop target. The BUG was reported for a udev-worker process.
> I guess udev is trying to read a namespace in the removal process. The BUG
> message notes that "RCU nest depth: 1, expected: 0". From the call trace, I
> think the "Invalid wait" in the RCU reader was in the function call chain for
> the read operation by udev as follows:
> 
> blk_mq_flush_plug_list
>  blk_mq_run_dispatch_ops
>   __blk_mq_run_dispatch_ops
>    rcu_read_lock                               ... RCU reader starts
>    blk_mq_plug_issue_direct
>     blk_mq_request_issue_directly
>      __blk_mq_request_issue_directly
>       q->mq_ops->queue_rq = nvme_loop_queue_rq
>        nvmet_req_init
>         nvmet_req_find_ns
>          nvmet_subsys_nsid_exists
> 	  mutex_lock                           ... waits in the RCU reader
> 
> I found the mutex_lock was added by the commit 505363957fad ("nvmet: fix nvme
> status code when namespace is disabled") for the kernel v6.9. So, this commit
> might be related to the cause.
> 
> The failure is 100% reproducible on my test node at this moment. I bisected and
> found that the trigger commit is 4e893ca81170 ("nvme_core: scan namespaces
> asynchronously"), which was merged to the kernel v6.12-rc1.

Oof, good fine. I think the easiest thing to do is add the
NVME_F_BLOCKING flag to the loop controller ops.

---
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index e32790d8fc260..77dd809fe4507 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -479,7 +479,7 @@ static void nvme_loop_reset_ctrl_work(struct work_struct *work)
 static const struct nvme_ctrl_ops nvme_loop_ctrl_ops = {
        .name                   = "loop",
        .module                 = THIS_MODULE,
-       .flags                  = NVME_F_FABRICS,
+       .flags                  = NVME_F_FABRICS | NVME_F_BLOCKING,
        .reg_read32             = nvmf_reg_read32,
        .reg_read64             = nvmf_reg_read64,
        .reg_write32            = nvmf_reg_write32,
--



More information about the Linux-nvme mailing list