[PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion

Thu Jan 14 16:25:28 EST 2021

>>> When a request is queued failed, blk_status_t is directly returned
>>> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
>>> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
>>> blk_mq_end_request to complete the request with BLK_STS_IOERR.
>>> In two scenarios, the request should be retried and may succeed.
>>> First, if work with nvme multipath, the request may be retried
>>> successfully in another path, because the error is probably related to
>>> the path. Second, if work without multipath software, the request may
>>> be retried successfully after error recovery.
>>> If the request is complete with BLK_STS_IOERR in 
>>> blk_mq_dispatch_rq_list.
>>> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
>>> request asynchronously such as in nvme_submit_user_cmd, in extreme
>>> scenario the request will be repeated freed in tear down.
>>> If a non-resource error occurs in queue_rq, should directly call
>>> nvme_complete_rq to complete request and set the state of request to
>>> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
>>> the request.
>>>
>>> Signed-off-by: Chao Leng <lengchao at huawei.com>
>>> ---
>>>   drivers/nvme/host/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>>> index df9f6f4549f1..4a89bf44ecdc 100644
>>> --- a/drivers/nvme/host/rdma.c
>>> +++ b/drivers/nvme/host/rdma.c
>>> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct 
>>> blk_mq_hw_ctx *hctx,
>>>   unmap_qe:
>>>       ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct 
>>> nvme_command),
>>>                   DMA_TO_DEVICE);
>>> -    return ret;
>>> +    return nvme_try_complete_failed_req(rq, ret);
>>
>> I don't understand this. There are errors that may not be related to
>> anything that is pathing related (sw bug, memory leak, mapping error,
>> etc, etc) why should we return this one-shot error?
> Although fail over retry is not required, if we return the error to
> blk-mq, a low probability crash may happen. because blk-mq do not set
> the state of request to MQ_RQ_COMPLETE before complete the request,
> the request may be freed asynchronously such as in nvme_submit_user_cmd.
> If race with error recovery, request double completion may happens.

Then fix that, don't work around it.

> 
> So we can not return the error to blk-mq if the blk_status_t is not
> BLK_STS_RESOURCE, BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE.

This is not something we should be handling in nvme. block drivers
should be able to fail queue_rq, and this all should live in the
block layer.