[PATCH v3 00/21] TP8028 Rapid Path Failure Recovery

Mohamed Khalfella mkhalfella at purestorage.com
Fri Feb 13 20:25:01 PST 2026


This patchset adds support for TP8028 Rapid Path Failure Recovery for
both nvme target and initiator. Rapid Path Failure Recovery brings
Cross-Controller Reset (CCR) functionality to nvme. This allows nvme
host to send an nvme command to a source nvme controller to reset
the impacted nvme controller, provided that both source and impacted
controllers are in the same nvme subsystem.

The main use of CCR is when one path to the nvme subsystem fails.
Inflight IOs on impacted nvme controller need to be terminated first
before they can be retried on another path. Otherwise data corruption
may happen. CCR provides a quick way to terminate these IOs on the
unreachable nvme controller allowing recovery to move quickly avoiding
unnecessary delays. In case of CCR is not possible, inflight requests
are held for duration defined by TP4129 KATO Corrections and
Clarifications before they are allowed to be retried.


On the target side:

* New struct members have been added to support CCR. struct nvme_id_ctrl
  has been updated with CIU (Controller Instance Uniquifier), CIRN
  (Controller Instance Random Number), and CQT (Command Quiesce Time).
  The combination of CIU, CNTLID, and CIRN is used to identify impacted
  controller in CCR command.

* CCR nvme command implemented on the target causes impacted controller
  to fail and drop connections to host.

* CCR logpage contains the status of pending CCR requests. An entry is
  added to the logpage after CCR request is validated. Completed CCR
  requests are removed from the logpage when controller becomes ready or
  when requested in get logpage command.

* An AEN is sent when CCR completes to let the host know that it is safe
  to retry inflight requests.


On the host side:

* CIU, CIRN, and CQT have been added to struct nvme_ctrl. CIU and CIRN
  have been added to sysfs to make the values visible to the user.
  CIU and CIRN can be used to construct and manually send admin-passthru
  CCR commands.

* New controller states FENCING and FENCED have been added to make sure
  that inflight request do not get canceled if they timeout during
  fencing process. FENCED exists so that controller state machine does
  not have a transition from FENCING to RESETTING. Instead FENCING ->
  FENCED -> RESETTING. This prevents a controller being fenced from
  getting reset. Only after fencing finishes the impacted controller is
  reset.

* Controller recovery in nvme_fence_ctrl() is invoked when LIVE
  controller hits an error or when a request times out. CCR is attempted
  first to reset impacted controller. If it fails then inflight requests
  are held until it is safe to retry them.

* Updated nvme fabric transports nvme-tcp, nvme-rdma, and nvme-fc to
  use CCR recovery.


Ideally all inflight requests should be held during controller recovery
and only retried after recovery is done. However, there are known
situations where that is not the case in this implementation. These gaps
will be addressed in future patches:

* Manual controller reset from sysfs will result in controller going to
  RESETTING state and all inflight requests to be canceled immediately
  and may be retried on another path.

* Manual controller delete from sysfs will also result in all inflight
  requests to be canceled immediately and may be retried on another path.

* In nvme-fc, nvme controller will be deleted if remote port disappears
  with no timeout specified. This results in immediate cancellation of
  requests that may be retried on another path.

* In nvme-rdma if HCA is removed all nvme controllers will be deleted.
  This results in canceling inflight IOs and may be they will be retried
  on another path.


Changes from v2:

- nvmet: Implement CCR nvme command
  - Minor changes addressing review comments on v2.

- nvme: Rapid Path Failure Recovery read controller identify fields
  Addressed security concern that CCR can be used to cause denail of
  service. Changed the permission of CIU and CIRN sysfs attributes
  from S_IRUGO to S_IRUSR. This makes sure only root user can read
  these attributes.

- nvme: Introduce FENCING and FENCED controller states
  Addressed code review comments. Minor changes.

- nvme: Implement cross-controller reset recovery
  - Refactored nvme_find_ctrl_ccr(), more idiomatic code.
  - Update nvme_issue_wait_ccr() to return
    - 0 on success.
    - EIO in case failure submitting CCR command
    - ETIMEDOUT timedout waiting for CCR operation.
    - EREMOTEIO CCR operation failed.
  - Updated nvme_fence_ctrl() such that CCR is operation
    is tried on one source controller maximum.

- nvme-tcp: Use CCR to recover controller that hits an error
  - Dropped ctrl->fenced_work. Moved to CQT patches.
  - nvme_tcp_fencing_work() resets controller regardless of
    CCR success or failure.

- nvme-rdma: Use CCR to recover controller that hits an error
  - Similar to nvme-tcp

- nvme-fc: Decouple error recovery from controller reset
  - nvme_fc_start_ioerr_recovery() queues ctrl->ioerr_work in case of
    CONNECTING, DELETING, and DELETING_NOIO without changing controller
    state.  For CONNECTING it addresses an issue raised during
    codereview. For DELECTING{_NOIO} it addresses an issue observed
    during testing where a controller is deleted with inflight IOs.
  - Updated nvme_fc_ctrl_ioerr_work() to handle CONNECTING state in
    special way just aborting outstanding IO. This change addresses an
    issue raised during code review.
  - nvme_fc_error_recovery() has been updated to flush
    ctrl->ctrl.async_event_work as mentioned in code review.

- nvme-fc: Use CCR to recover controller that hits an error
  - Changes similar to nvme-rdma and nvme-tcp

- nvme-fc: Do not cancel requests in io taget before it is initialized
  - A new patch added to address an issue observed during testing.

- CQT changes have been pulled to separate patches
  - nvmet: Add support for CQT to nvme target
  - nvme: Add support for CQT to nvme host
  - nvme: Update CCR completion wait timeout to consider CQT
  - nvme-tcp: Extend FENCING state per TP4129 on CCR failure
  - nvme-rdma: Extend FENCING state per TP4129 on CCR failure
  - nvme-fc: Extend FENCING state per TP4129 on CCR failure

v2: https://lore.kernel.org/all/20260130223531.2478849-1-mkhalfella@purestorage.com/

Mohamed Khalfella (21):
  nvmet: Rapid Path Failure Recovery set controller identify fields
  nvmet/debugfs: Export controller CIU and CIRN via debugfs
  nvmet: Implement CCR nvme command
  nvmet: Implement CCR logpage
  nvmet: Send an AEN on CCR completion
  nvme: Rapid Path Failure Recovery read controller identify fields
  nvme: Introduce FENCING and FENCED controller states
  nvme: Implement cross-controller reset recovery
  nvme: Implement cross-controller reset completion
  nvme-tcp: Use CCR to recover controller that hits an error
  nvme-rdma: Use CCR to recover controller that hits an error
  nvme-fc: Decouple error recovery from controller reset
  nvme-fc: Use CCR to recover controller that hits an error
  nvme-fc: Hold inflight requests while in FENCING state
  nvme-fc: Do not cancel requests in io taget before it is initialized
  nvmet: Add support for CQT to nvme target
  nvme: Add support for CQT to nvme host
  nvme: Update CCR completion wait timeout to consider CQT
  nvme-tcp: Extend FENCING state per TP4129 on CCR failure
  nvme-rdma: Extend FENCING state per TP4129 on CCR failure
  nvme-fc: Extend FENCING state per TP4129 on CCR failure

 drivers/nvme/host/constants.c   |   1 +
 drivers/nvme/host/core.c        | 222 +++++++++++++++++++++++++++-
 drivers/nvme/host/fc.c          | 249 ++++++++++++++++++++++++--------
 drivers/nvme/host/nvme.h        |  25 ++++
 drivers/nvme/host/rdma.c        |  63 +++++++-
 drivers/nvme/host/sysfs.c       |  27 ++++
 drivers/nvme/host/tcp.c         |  63 +++++++-
 drivers/nvme/target/admin-cmd.c | 124 ++++++++++++++++
 drivers/nvme/target/configfs.c  |  31 ++++
 drivers/nvme/target/core.c      | 113 ++++++++++++++-
 drivers/nvme/target/debugfs.c   |  21 +++
 drivers/nvme/target/nvmet.h     |  20 ++-
 include/linux/nvme.h            |  70 ++++++++-
 13 files changed, 953 insertions(+), 76 deletions(-)


base-commit: cd7a5651db263b5384aef1950898e5e889425134
-- 
2.52.0




More information about the Linux-nvme mailing list