[PATCH] nvme-fabrics: fix crash for no IO queues
Sagi Grimberg
sagi at grimberg.me
Fri Mar 5 20:58:37 GMT 2021
> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
> over rdma(roce) reconnection, the reason is use the queue which is not
> alloced.
>
> If queue is not live, should not allow queue request.
Can you describe exactly the scenario here? What is the state
here? LIVE? or DELETING?
>
> Signed-off-by: Chao Leng <lengchao at huawei.com>
> ---
> drivers/nvme/host/fabrics.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index 733010d2eafd..2479744fc349 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -189,7 +189,7 @@ static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
> {
> if (likely(ctrl->state == NVME_CTRL_LIVE ||
> ctrl->state == NVME_CTRL_DELETING))
> - return true;
> + return queue_live;
> return __nvmf_check_ready(ctrl, rq, queue_live);
> }
There were some issues in the past that made us allow submitting
requests in DELETING state and introducing DELETING_NOIO. See
patch ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work
and/or ana_work")
The driver should be able to accept I/O in DELETING because the core
changes the state to DELETING_NOIO _before_ it calls ->delete_ctrl so I
don't understand how you get to this if the queue is not allocated...
More information about the Linux-nvme
mailing list