[PATCH 0/3] nvme-tcp: start error recovery after KATO

Sagi Grimberg sagi at grimberg.me
Tue Sep 12 05:12:42 PDT 2023


>>> Hi all,
>>>
>>> there have been some very insistent reports of data corruption
>>> with certain target implementations due to command retries.
>>
>> None of which were reported on this list...
>>
> Correct. I can ask them to post their finding here if that would make a 
> difference.
> 
>>> Problem here is that for TCP we're starting error recovery
>>> immediately after either a command timeout or a (local) link loss.
>>
>> It does so only in one occasion, when the user triggered a
>> reset_controller. a command timeout is greater than the default
>> kato (6 times in fact), was this the case where the issue was
>> observed? If so, the timeout handler should probably just wait
>> the kato remaining time.
>>
> Nothing to do with reset_controller.
> The problem really is in nvme_tcp_state_change(), which will blindly 
> start error recovery (and a subsequent command retry) whenever the local 
> link drops.

That is incorrect. A link drop does not trigger a socket state change.
a state change is triggered for specific socket states
(see net/ipv4/tcp.c for description of socket states).

These are a set of states that describes local and remote sides
shutting down (or a TIME_WAIT). The cases where the host
triggers error recovery without the controller acknowledging it
(in case the link is down) are FIN_WAIT1/2, and these happen
when the user triggers a controller reset/disconnect.

In the other circumstances, we'll see a TCP_CLOSE/CLOSE_WAIT/CLOSING
which should indicate the controller acknowledged the shutdown.

> And that error recover does _not_ wait for KATO, but rather
> causes an immediate retry.

We can't blindly wait for KATO. If the controller saw the socket
closure, there is no point waiting kato.

If the user sets KATO to be 3 hours, and does a controller reset,
what happens now?

Only when the host triggers the shutdown, and that is not acknowledged
by the controller, then yes, the host needs to wait the remaining of
kato.



More information about the Linux-nvme mailing list