[PATCH v2 11/14] nvme-rdma: Use CCR to recover controller that hits an error
Hannes Reinecke
hare at suse.de
Mon Feb 2 21:35:59 PST 2026
On 1/30/26 23:34, Mohamed Khalfella wrote:
> An alive nvme controller that hits an error now will move to FENCING
> state instead of RESETTING state. ctrl->fencing_work attempts CCR to
> terminate inflight IOs. If CCR succeeds, switch to FENCED -> RESETTING
> and continue error recovery as usual. If CCR fails, the behavior depends
> on whether the subsystem supports CQT or not. If CQT is not supported
> then reset the controller immediately as if CCR succeeded in order to
> maintain the current behavior. If CQT is supported switch to time-based
> recovery. Schedule ctrl->fenced_work resets the controller when time
> based recovery finishes.
>
> Either ctrl->err_work or ctrl->reset_work can run after a controller is
> fenced. Flush fencing work when either work run.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>
> ---
> drivers/nvme/host/rdma.c | 62 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 61 insertions(+), 1 deletion(-)
>
The rdma driver is largely similar to the tcp one, so comments there
apply to both, one would guess.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list