[PATCH v1 1/3] driver core: Support asynchronous driver shutdown
Daniel Wagner
dwagner at suse.de
Thu Mar 31 05:07:44 PDT 2022
On Wed, Mar 30, 2022 at 02:12:18PM +0000, Belanger, Martin wrote:
> I know this patch is mainly for PCI devices, however, NVMe over Fabrics
> devices can suffer even longer shutdowns. Last September, I reported
> that shutting down an NVMe-oF TCP connection while the network is down
> will result in a 1-minute deadlock. That's because the driver tries to perform
> a proper shutdown by sending commands to the remote target and the
> timeout for unanswered commands is 1-minute. If one needs to shut down
> several NVMe-oF connections, each connection will be shut down sequentially
> taking each 1 minute. Try running "nvme disconnect-all" while the network
> is down and you'll see what I mean. Of course, the KATO is supposed to
> detect when connectivity is lost, but if you have a long KATO (e.g. 2 minutes)
> you will most likely hit this condition.
I've debugging something similar:
[44888.710527] nvme nvme0: Removing ctrl: NQN "xxx"
[44898.981684] nvme nvme0: failed to send request -32
[44960.982977] nvme nvme0: queue 0: timeout request 0x18 type 4
[44960.983099] nvme nvme0: Property Set error: 881, offset 0x14
Currently testing this patch:
+++ b/drivers/nvme/host/tcp.c
@@ -1103,9 +1103,12 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
if (ret == -EAGAIN) {
ret = 0;
} else if (ret < 0) {
+ struct request *rq = blk_mq_rq_from_pdu(queue->request);
+
dev_err(queue->ctrl->ctrl.device,
"failed to send request %d\n", ret);
- if (ret != -EPIPE && ret != -ECONNRESET)
+ if ((ret != -EPIPE && ret != -ECONNRESET) ||
+ rq->cmd_flags & REQ_FAILFAST_DRIVER)
nvme_tcp_fail_request(queue->request);
nvme_tcp_done_send_req(queue);
}
More information about the Linux-nvme
mailing list