[PATCH 5/6] nvme-rdma: fix timeout handler

Sagi Grimberg sagi at grimberg.me
Tue Aug 4 21:12:51 EDT 2020


>>> may interrupt by hard interrupt, and then timeout progress flush work
>>> at this time. Thus error recovery and nvme_rdma_complete_timed_out may
>>> concurrent to stop queue. will cause: error recovery may cancel request
>>> or nvme_rdma_complete_timed_out may complete request, but the queue may
>>> not be stoped. Thus will cause abnormal.
>>
>> We should be fine and safe to complete the I/O.
> 
> Complete request in nvme_rdma_timeout or cancel request in
> nvme_rdma_error_recovery_work or nvme_rdma_reset_ctrl_work is not safe.
> Because the queue may be not really stoped, it may just cleard the flag:
> NVME_RDMA_Q_ALLOCATED for the queue. Thus one request may concurrent
> treat by two progress, it is not allowed.

The request being timed out cannot be completed after the queue is
stopped, that is the point of nvme_rdma_stop_queue. if it is only
ALLOCATED, we did not yet connect hence there is zero chance for
any command to complete.



More information about the Linux-nvme mailing list