[PATCH 0/3] nvme: add support to delay retrying aborted requests

Thu Apr 10 05:20:51 PDT 2025

Hi all,

In this thread [1], we discussed a potential data corruption issue in NVMe
multipath scenarios when retrying I/O during path failover. According to
section 9.6 "Communication Loss Handling" of the NVMe Base Specification 2.1,
to avoid such risks, the NVMe host must close the connection and wait for a
sufficient period before retrying an aborted request. This ensures that the
target has enough time to detect the connection loss and clean up any residual
I/O.

With this patchset, the host will close the connection and wait for
delay_io_retry_time before retrying the aborted I/O.

Comments and reviews are welcome.

[1] https://lore.kernel.org/r/DD72B155-35CF-47F6-A342-13445A9E432F@gmail.com

Jiewei Ke (3):
  nvme: check IO queues liveness during controller reconnection
  nvme: add support to delay retrying aborted requests
  nvme: export delay_io_retry_time to sysfs

 drivers/nvme/host/core.c      |  3 ++
 drivers/nvme/host/fabrics.c   | 23 ++++++++++++++-
 drivers/nvme/host/fabrics.h   |  3 ++
 drivers/nvme/host/multipath.c | 54 ++++++++++++++++++++++++++++++++---
 drivers/nvme/host/nvme.h      |  7 +++++
 drivers/nvme/host/rdma.c      | 27 +++++++++++++++++-
 drivers/nvme/host/sysfs.c     | 37 ++++++++++++++++++++++++
 drivers/nvme/host/tcp.c       | 27 +++++++++++++++++-
 8 files changed, 174 insertions(+), 7 deletions(-)

-- 
2.36.0