[PATCH 0/3] nvme-tcp: start error recovery after KATO

Sagi Grimberg sagi at grimberg.me
Tue Sep 12 04:51:23 PDT 2023


> Hi all,
> 
> there have been some very insistent reports of data corruption
> with certain target implementations due to command retries.

None of which were reported on this list...

> Problem here is that for TCP we're starting error recovery
> immediately after either a command timeout or a (local) link loss.

It does so only in one occasion, when the user triggered a
reset_controller. a command timeout is greater than the default
kato (6 times in fact), was this the case where the issue was
observed? If so, the timeout handler should probably just wait
the kato remaining time.

BTW, the same happens for rdma as well. Nothing should be
tcp specific here afaict.



More information about the Linux-nvme mailing list