[PATCH 3/3] nvme: redirect commands on dying queue

Sun Aug 16 23:54:52 EDT 2020


On 2020/8/15 14:55, Christoph Hellwig wrote:
> On Fri, Aug 14, 2020 at 11:44:12AM -0700, Sagi Grimberg wrote:
>>
>>> If a command send through nvme-multupath failed on a dying queue, resend it
>>> on another path.
>>
>> So this is a race where we got a retry-able status from the controller
>> (not from the host teardwon sequence) and we just happen to see
>> a dying queue?
> 
> I think so, maybe Chao can explain the scenario in a little more detail.
> .
The scenario: IO already return with non path error(such as
NVME_SC_CMD_INTERRUPTED or NVME_SC_DATA_XFER_ERROR etc.), but is waiting
to be processed, at the same time, delete ctrl happens, delete ctrl may
set queue flag: QUEUE_FLAG_DYING when call nvme_remove_namespaces. Then
for example, if fabric is rdma, delete ctrl will call
nvme_rdma_delete_ctrl, nvme_rdma_delete_ctrl will drain qp first, thus
the IO, which return with non path error, can not be failover retry,
and also can not retry local, IO will interrupt.