[PATCH v3 19/21] nvme-tcp: Extend FENCING state per TP4129 on CCR failure

Hannes Reinecke hare at suse.de
Wed Feb 18 00:26:31 PST 2026


On 2/17/26 18:58, Mohamed Khalfella wrote:
> On Mon 2026-02-16 13:56:10 +0100, Hannes Reinecke wrote:
>> On 2/14/26 05:25, Mohamed Khalfella wrote:
>>> If CCR operations fail and CQT is supported, we must defer the retry of
>>> inflight requests per TP4129. Update ctrl->fencing_work to schedule
>>> ctrl->fenced_work, effectively extending the FENCING state. This delay
>>> ensures that inflight requests are held until it is safe for them to be
>>> retired.
>>>
>>> Signed-off-by: Mohamed Khalfella <mkhalfella at purestorage.com>
>>> ---
>>>    drivers/nvme/host/tcp.c | 39 +++++++++++++++++++++++++++++++++++----
>>>    1 file changed, 35 insertions(+), 4 deletions(-)
>>>
>> Can't you merge / integrate this into the nvme_fence_ctrl() routine?
> 
> ctrl->fencing_work and ctrl->fenced_work are in transport specific
> controller, struct nvme_tcp_ctrl in this case. There is no easy way to
> access these members from nvme_fence_ctrl(). One option to go around
> that is to move them into struct nvme_ctrl. But we call error recovery
> after a controller is fenced, and error recovery is implemented in
> transport specific way. That is why the delay is implemented/repeated
> for every transport.
> 
>> The previous patch already extended the timeout to cover for CQT, so
>> we can just wait for the timeout if CCR failed, no?
> 
> Following on the point above. One change can be done is to reset the
> controller after fencing finishes instead of using error recovery.
> This way everything lives in core.c. But I have not tested that.
> 
> Do you think this is better than what has been implemented now?
> 
Yeah, the eternal problem.
At one point someone will have to explain to my why 'reset' and
'error handling' are two _distinct_ code paths in nvme-tcp.
I really don't get that. I _guess_ it's trying to hold requests
when doing a reset, and aborting requests if it's an error.
But why one needs to make that distinction is a mystery to
me; FC combines both paths and seems to work quite happily.

Thing is, that will get in the way when trying to move fencing
into the generic layer; you only can call 'nvme_reset_ctrl()',
and hope that this one will abort commands.

I'll check.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list