[PATCH] nvme-fabrics: fix crash for no IO queues
Chao Leng
lengchao at huawei.com
Mon Mar 8 01:30:47 GMT 2021
On 2021/3/6 4:58, Sagi Grimberg wrote:
>
>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>> over rdma(roce) reconnection, the reason is use the queue which is not
>> alloced.
>>
>> If queue is not live, should not allow queue request.
>
> Can you describe exactly the scenario here? What is the state
> here? LIVE? or DELETING?
If seting feature(NVME_FEAT_NUM_QUEUES) failed due to time out or
the target return 0 io queues, nvme_set_queue_count will return 0,
and then reconnection will continue and success. The state of controller
is LIVE. The request will continue to deliver by call ->queue_rq(),
and then crash happens.
>
>>
>> Signed-off-by: Chao Leng <lengchao at huawei.com>
>> ---
>> drivers/nvme/host/fabrics.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
>> index 733010d2eafd..2479744fc349 100644
>> --- a/drivers/nvme/host/fabrics.h
>> +++ b/drivers/nvme/host/fabrics.h
>> @@ -189,7 +189,7 @@ static inline bool nvmf_check_ready(struct nvme_ctrl *ctrl, struct request *rq,
>> {
>> if (likely(ctrl->state == NVME_CTRL_LIVE ||
>> ctrl->state == NVME_CTRL_DELETING))
>> - return true;
>> + return queue_live;
>> return __nvmf_check_ready(ctrl, rq, queue_live);
>> }
>
> There were some issues in the past that made us allow submitting
> requests in DELETING state and introducing DELETING_NOIO. See
> patch ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
This doesn't make any difference. When in deleting state the queue is
still live.
>
> The driver should be able to accept I/O in DELETING because the core
> changes the state to DELETING_NOIO _before_ it calls ->delete_ctrl so I
> don't understand how you get to this if the queue is not allocated...
The state of controller is live. The deletion process looks good.
> .
More information about the Linux-nvme
mailing list