[PATCH 0/1] nvme-loop: avoid cancelling/aborting I/O and admin tagset

Nilay Shroff nilay at linux.ibm.com
Fri Mar 13 04:38:47 PDT 2026


Hi,

During nvme-loop controller reset or shutdown, the current code first
cancels/aborts the I/O and admin tagsets and then proceeds to destroy
the corresponding I/O and admin queues.

For the loop controller this cancellation is unnecessary. The queue
destruction path already waits for all in-flight target I/O and admin
operations to complete, which ensures that no outstanding operations
remain before the queues are torn down.

Cancelling the tagsets first also introduces a small race window where
a late completion from the target may arrive after the corresponding
request tag has been cancelled but before the queues are destroyed.
If this occurs, the completion path may attempt to access a request
whose tag has already been cancelled or freed, which can lead to a
kernel crash. So the patch in this patchset, avoids cancelling/aborting
the I/O and admin tagsets for nvme-loop target, as this step is redundant
and can expose the race described above.

This issue was observed while running blktests nvme/040. The kernel crash
encountered is shown below:

run blktests nvme/040 at 2026-03-08 06:34:27
loop0: detected capacity change from 0 to 2097152
nvmet: adding nsid 1 to subsystem blktests-subsystem-1
nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
nvme nvme6: creating 96 I/O queues.
nvme nvme6: new ctrl: "blktests-subsystem-1"
nvme_log_error: 1 callbacks suppressed
block nvme6n1: no usable path - requeuing I/O
nvme6c6n1: Read(0x2) @ LBA 2096384, 128 blocks, Host Aborted Command (sct 0x3 / sc 0x71) 
blk_print_req_error: 1 callbacks suppressed
I/O error, dev nvme6c6n1, sector 2096384 op 0x0:(READ) flags 0x2880700 phys_seg 1 prio class 2
block nvme6n1: no usable path - requeuing I/O
Kernel attempted to read user page (286) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000286
Faulting instruction address: 0xc00000000090ca18
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
[...]
NIP [c000000000961274] blk_mq_complete_request_remote+0x28/0x2d4
LR [c008000009af1808] nvme_loop_queue_response+0x110/0x290 [nvme_loop]
    Call Trace:
     0xc00000000502c640 (unreliable)
     nvme_loop_queue_response+0x104/0x290 [nvme_loop]
     __nvmet_req_complete+0x80/0x498 [nvmet]
     nvmet_req_complete+0x24/0xf8 [nvmet]
     nvmet_bio_done+0x58/0xcc [nvmet]
     bio_endio+0x250/0x390
     blk_update_request+0x2e8/0x68c
     blk_mq_end_request+0x30/0x5c
     lo_complete_rq+0x94/0x110 [loop]
     blk_complete_reqs+0x78/0x98
     handle_softirqs+0x148/0x454
     do_softirq_own_stack+0x3c/0x50
     __irq_exit_rcu+0x18c/0x1b4
     irq_exit+0x1c/0x34
     do_IRQ+0x114/0x278
     hardware_interrupt_common_virt+0x28c/0x290

The above kernel oops occured in blk_mq_complete_request_remote():
1319 bool blk_mq_complete_request_remote(struct request *rq)
1320 {
1321         WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
1322 
1323         /*
1324          * For request which hctx has only one ctx mapping,
1325          * or a polled request, always complete locally,
1326          * it's pointless to redirect the completion.
1327          */
1328         if ((rq->mq_hctx->nr_ctx == 1 &&
1329              rq->mq_ctx->cpu == raw_smp_processor_id()) ||
1330              rq->cmd_flags & REQ_POLLED)
1331                 return false;

In the above code on line #1328, when kernel attempts to dereference
rq->mq_hctx->nr_ctx it triggers carsh because rq->mq_hctx is NULL. 
This request has been already aborted/cancelled while loop controller
reset is initiated.

Nilay Shroff (1):
  nvme-loop: do not cancel I/O and admin tagset during ctrl
    reset/shutdown

 drivers/nvme/target/loop.c | 2 --
 1 file changed, 2 deletions(-)

-- 
2.53.0




More information about the Linux-nvme mailing list