[PATCH 0/3] nvme: add support to delay retrying aborted requests
Jiewei Ke
jiewei at smartx.com
Thu Apr 10 05:20:51 PDT 2025
Hi all,
In this thread [1], we discussed a potential data corruption issue in NVMe
multipath scenarios when retrying I/O during path failover. According to
section 9.6 "Communication Loss Handling" of the NVMe Base Specification 2.1,
to avoid such risks, the NVMe host must close the connection and wait for a
sufficient period before retrying an aborted request. This ensures that the
target has enough time to detect the connection loss and clean up any residual
I/O.
With this patchset, the host will close the connection and wait for
delay_io_retry_time before retrying the aborted I/O.
Comments and reviews are welcome.
[1] https://lore.kernel.org/r/DD72B155-35CF-47F6-A342-13445A9E432F@gmail.com
Jiewei Ke (3):
nvme: check IO queues liveness during controller reconnection
nvme: add support to delay retrying aborted requests
nvme: export delay_io_retry_time to sysfs
drivers/nvme/host/core.c | 3 ++
drivers/nvme/host/fabrics.c | 23 ++++++++++++++-
drivers/nvme/host/fabrics.h | 3 ++
drivers/nvme/host/multipath.c | 54 ++++++++++++++++++++++++++++++++---
drivers/nvme/host/nvme.h | 7 +++++
drivers/nvme/host/rdma.c | 27 +++++++++++++++++-
drivers/nvme/host/sysfs.c | 37 ++++++++++++++++++++++++
drivers/nvme/host/tcp.c | 27 +++++++++++++++++-
8 files changed, 174 insertions(+), 7 deletions(-)
--
2.36.0
More information about the Linux-nvme
mailing list