[PATCH 0/2] Fix crash when rescan ns after set queue count cmd timeout
Ruozhu Li
liruozhu at huawei.com
Tue Aug 3 02:06:28 PDT 2021
Hi,
We got a BUG_ON when rescan ns after set queue count cmd timeout:
--
BUG_ON(hctx_idx >= ctrl->ctrl.queue_count); //nvme_rdma_init_hctx
--
Call trace:
nvme_rdma_init_hctx+0x58/0x60 [nvme_rdma]
blk_mq_realloc_hw_ctxs+0x140/0x4c0
blk_mq_init_allocated_queue+0x130/0x410
blk_mq_init_queue+0x40/0x88
nvme_validate_ns+0xb8/0x740
nvme_scan_work+0x29c/0x460
process_one_work+0x1f8/0x490
worker_thread+0x50/0x4b8
kthread+0x134/0x138
ret_from_fork+0x10/0x18
--
This happened because:
1) Host set queue count feature timeout in reconnection, set ctrl->
queue_count to 1, and schedule another reconnect.
2) Next reconnection succeed but not create any io queues, because
ctrl->queue_count set to 1, host won't configure io queue again.
3) Del/add ns on ctrl causes host rescan ns, kernel BUG_ON when detect
hctx_idx greater than ctrl->queue_count.
Try to fix it with following patches.Any comments and reviews are welcome.
Thanks,
Ruozhu
Ruozhu Li (2):
nvme-rdma: always try to configure io queue when user wants it
nvme: don't do scan work if io queue count is zero
drivers/nvme/host/core.c | 6 ++++--
drivers/nvme/host/rdma.c | 4 +++-
2 files changed, 7 insertions(+), 3 deletions(-)
--
2.16.4
More information about the Linux-nvme
mailing list