[PATCH v2 08/14] nvme: Implement cross-controller reset recovery

Tue Feb 10 14:49:15 PST 2026

On 2/10/2026 2:27 PM, Mohamed Khalfella wrote:
> On Tue 2026-02-10 14:09:27 -0800, James Smart wrote:
>> On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:
>> ...
>>> +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
>>> +{
>>> +	unsigned long deadline, now, timeout;
>>> +	struct nvme_ctrl *sctrl;
>>> +	u32 min_cntlid = 0;
>>> +	int ret;
>>> +
>>> +	timeout = nvme_fence_timeout_ms(ictrl);
>>> +	dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
>>> +
>>> +	now = jiffies;
>>> +	deadline = now + msecs_to_jiffies(timeout);
>>> +	while (time_before(now, deadline)) {
>>
>> Q: don't we have something to identify the controller's subsystem
>> supports CCR before we starting selecting controllers and sending CCR ?
>>
>> I would think on older devices that don't support it we should be
>> skipping this loop.   The loop could delay the Time-Based delay without
>> any CCR.
> 
> I do not think we have something that identifies CCR support at
> subsystem level. The spec defines CCRL at the controller level. The loop
> should not that bad. nvme_find_ctrl_ccr() should return NULL if CCR is
> not supported and nvme_fence_ctrl() will return immediately.
> 
>>
>> -- james
>>

I would think CCRL on the failed controller would be enough to assume 
the subsystem supports it.

I'm not worried about the coding on the host is so bad. It's more the 
multiple paths that must have cmds sent to them and getting error 
responses for unknown cmds (should be responded to ok, but you never 
know) as well as creating conditions for other errors where there will 
be no return for it - e.g. other paths losing connectivity while the ccr 
outstanding, etc. yes, they all have to work, but why bother adding 
these flows to an old controller that would never do CCR ?

-- james