nvme-cli connect regression

Tue Apr 1 08:25:51 PDT 2025

Hi,

Luca reported that "occasional failures in the systemd integration test
that uses nvme-cli:" [1]

[   10.378713] TEST-84-STORAGETM.sh[316]: + nvme connect-all -t tcp -a 127.0.0.1 -s 16858 --hostid=95fe8041-3f53-415b-bc40-1bbd8932e7e8
[   10.397892] nvme_tcp: queue 0: failed to receive icresp, error -4
[   10.397326] TEST-84-STORAGETM.sh[340]: failed to add controller, error failed to write to nvme-fabrics device

I was not able to identify any changes in nvme-cli v2.12 which could
explain this problem. The kernel in question is a stable kernel (6.13.7)
which got following commit back ported:

578539e09690 ("nvme-tcp: fix connect failure on receiving partial ICResp
PDU").

The change itself looks okay but I think introduces a behavior change
for the initial connect attempt.

The error code is EINTR, should the kernel retry here, or is userland in
charge to retry? Assuming we should retry... Thoughts?

Thanks,
Daniel

[1] https://github.com/linux-nvme/nvme-cli/issues/2760