[PATCH 3/3] nvme: redirect commands on dying queue

Chao Leng lengchao at huawei.com
Sun Aug 16 23:54:52 EDT 2020



On 2020/8/15 14:55, Christoph Hellwig wrote:
> On Fri, Aug 14, 2020 at 11:44:12AM -0700, Sagi Grimberg wrote:
>>
>>> If a command send through nvme-multupath failed on a dying queue, resend it
>>> on another path.
>>
>> So this is a race where we got a retry-able status from the controller
>> (not from the host teardwon sequence) and we just happen to see
>> a dying queue?
> 
> I think so, maybe Chao can explain the scenario in a little more detail.
> .
The scenario: IO already return with non path error(such as
NVME_SC_CMD_INTERRUPTED or NVME_SC_DATA_XFER_ERROR etc.), but is waiting
to be processed, at the same time, delete ctrl happens, delete ctrl may
set queue flag: QUEUE_FLAG_DYING when call nvme_remove_namespaces. Then
for example, if fabric is rdma, delete ctrl will call
nvme_rdma_delete_ctrl, nvme_rdma_delete_ctrl will drain qp first, thus
the IO, which return with non path error, can not be failover retry,
and also can not retry local, IO will interrupt.



More information about the Linux-nvme mailing list