[PATCH v2 4/6] nvme-rdma: avoid IO error and repeated request completion

Wed Jan 13 19:19:47 EST 2021

> When a request is queued failed, blk_status_t is directly returned
> to the blk-mq. If blk_status_t is not BLK_STS_RESOURCE,
> BLK_STS_DEV_RESOURCE, BLK_STS_ZONE_RESOURCE, blk-mq call
> blk_mq_end_request to complete the request with BLK_STS_IOERR.
> In two scenarios, the request should be retried and may succeed.
> First, if work with nvme multipath, the request may be retried
> successfully in another path, because the error is probably related to
> the path. Second, if work without multipath software, the request may
> be retried successfully after error recovery.
> If the request is complete with BLK_STS_IOERR in blk_mq_dispatch_rq_list.
> The state of request may be changed to MQ_RQ_IN_FLIGHT. If free the
> request asynchronously such as in nvme_submit_user_cmd, in extreme
> scenario the request will be repeated freed in tear down.
> If a non-resource error occurs in queue_rq, should directly call
> nvme_complete_rq to complete request and set the state of request to
> MQ_RQ_COMPLETE. nvme_complete_rq will decide to retry, fail over or end
> the request.
> 
> Signed-off-by: Chao Leng <lengchao at huawei.com>
> ---
>   drivers/nvme/host/rdma.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index df9f6f4549f1..4a89bf44ecdc 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -2093,7 +2093,7 @@ static blk_status_t nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
>   unmap_qe:
>   	ib_dma_unmap_single(dev, req->sqe.dma, sizeof(struct nvme_command),
>   			    DMA_TO_DEVICE);
> -	return ret;
> +	return nvme_try_complete_failed_req(rq, ret);

I don't understand this. There are errors that may not be related to
anything that is pathing related (sw bug, memory leak, mapping error,
etc, etc) why should we return this one-shot error?