[PATCH] nvme-fabrics: fix crash for no IO queues

Sagi Grimberg sagi at grimberg.me
Fri Mar 5 20:58:37 GMT 2021


> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
> over rdma(roce) reconnection, the reason is use the queue which is not
> alloced.
> 
> If queue is not live, should not allow queue request.

Can you describe exactly the scenario here? What is the state
here? LIVE? or DELETING?

> 
> Signed-off-by: Chao Leng <lengchao at huawei.com>
> ---
>   drivers/nvme/host/fabrics.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
> index 733010d2eafd..2479744fc349 100644
> --- a/drivers/nvme/host/fabrics.h
> +++ b/drivers/nvme/host/fabrics.h
> @@ -189,7 +189,7 @@ static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
>   {
>   	if (likely(ctrl->state == NVME_CTRL_LIVE ||
>   		   ctrl->state == NVME_CTRL_DELETING))
> -		return true;
> +		return queue_live;
>   	return __nvmf_check_ready(ctrl, rq, queue_live);
>   }

There were some issues in the past that made us allow submitting
requests in DELETING state and introducing DELETING_NOIO. See
patch ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work 
and/or ana_work")

The driver should be able to accept I/O in DELETING because the core
changes the state to DELETING_NOIO _before_ it calls ->delete_ctrl so I
don't understand how you get to this if the queue is not allocated...



More information about the Linux-nvme mailing list