[PATCH 5/6] nvme-rdma: fix timeout handler

Wed Aug 5 03:19:43 EDT 2020

>>>> The request being timed out cannot be completed after the queue is
>>>> stopped, that is the point of nvme_rdma_stop_queue. if it is only
>>>> ALLOCATED, we did not yet connect hence there is zero chance for
>>>> any command to complete.
>>> The request may already complete before stop queue, it is in the cq, but
>>> is not treated by software.
>>
>> Not possible, ib_drain_cq completion guarantees that all cqes were
>> reaped and handled by SW.
>>
>>> If nvme_rdma_stop_queue concurrent
>>
>> Before we complete we make sure the queue is stopped (and drained and
>> reaped).
>>
>> , for
>>> example:
>>> The error recovery run first, it will clear the flag:NVME_RDMA_Q_LIVE,
>>> and then wait drain cq. At the same time nvme_rdma_timeout
>>> call nvme_rdma_stop_queue will return immediately, and then may call
>>> blk_mq_complete_request. but error recovery may drain cq at the same
>>> time, and may also treat the same request.
>>
>> We flush the err_work before running nvme_rdma_stop_queue exactly
>> because of that. your example cannot happen.
> Flush work is not safe. See my previous email.

How is it not safe? when flush_work returns, the work is guaranteed
to have finished execution, and we only do that for states
RESETTING/CONNECTING which means that it either has already started
or already finished.