nvme-tcp failed to send request -32, kernel ver 4.18.0-553.16.1.el8_10.x86_64
Engel, Amit
Amit.Engel at Dell.com
Thu Nov 7 02:12:25 PST 2024
Hello,
We are hitting recently nvme-tcp failures with kernel ver 4.18.0-553.16.1.el8_10.x86_64.
The failure is:
2024-10-21T15:44:38.339450+00:00 nc1095201 kernel: [32216.873887] nvme nvme1: failed to send request -32
From the below snip, you can see that after a timeout request, ctrl (nvme1) is trying to reconnect.
After some time, 80 I/O queues have been created and mapped.
From that point we start seeing ‘failed to send request -32’ errors (errno EPIPE), which maybe makes sense since the ctrl was not successfully reconnected (yet).
However, even after nvme1 ctrl was ‘successfully reconnected’, send requests failures are still seen:
2024-10-23T19:45:59.962818+03:00 nc5222171 kernel: [37768.167911] nvme nvme1: Successfully reconnected (218 attempt)
2024-10-23T19:46:05.018829+03:00 nc5222171 kernel: [37773.223076] nvme nvme1: failed to send request -32
On the target side, there is no indication for any connectivity issue or failure.
Log snip:
2024-10-23T19:45:38.202874+03:00 nc5222171 kernel: [37746.410245] nvme nvme1: Reconnecting in 10 seconds...
2024-10-23T19:45:48.449826+03:00 nc5222171 kernel: [37756.655842] nvme nvme1: creating 80 I/O queues.
2024-10-23T19:45:55.491825+03:00 nc5222171 kernel: [37763.697243] nvme nvme1: mapped 80/0/0 default/read/poll queues.
2024-10-23T19:45:59.955822+03:00 nc5222171 kernel: [37768.160705] nvme nvme1: failed to send request -32
2024-10-23T19:45:59.955855+03:00 nc5222171 kernel: [37768.161396] nvme nvme1: Failed to configure AEN (cfg 900)
2024-10-23T19:45:59.956743+03:00 nc5222171 kernel: [37768.161525] nvme nvme1: failed to send request -32
2024-10-23T19:45:59.956753+03:00 nc5222171 kernel: [37768.161825] nvme nvme1: failed to send request -32
2024-10-23T19:45:59.956753+03:00 nc5222171 kernel: [37768.162140] nvme nvme1: failed to send request -32
2024-10-23T19:45:59.957735+03:00 nc5222171 kernel: [37768.162411] nvme nvme1: Identify NS List failed (status=0x370)
2024-10-23T19:45:59.957741+03:00 nc5222171 kernel: [37768.162463] nvme nvme1: failed to send request -32
2024-10-23T19:45:59.962818+03:00 nc5222171 kernel: [37768.167911] nvme nvme1: Successfully reconnected (218 attempt)
2024-10-23T19:46:05.018829+03:00 nc5222171 kernel: [37773.223076] nvme nvme1: failed to send request -32
2024-10-23T19:46:05.018860+03:00 nc5222171 kernel: [37773.223411] nvme nvme1: failed nvme_keep_alive_end_io error=4
2024-10-23T20:56:35.029818+03:00 nc5222171 kernel: [42002.862957] nvme nvme0: failed to send request -32
2024-10-23T20:56:35.052806+03:00 nc5222171 kernel: [42002.886076] nvme nvme1: failed to send request -32
Have you seen this behavior in the past? Any idea why it happens?
Thanks
Amit E
More information about the Linux-nvme
mailing list