Command timeouts with NVMe TCP kernel driver
Samuel Jones
sjones at kalrayinc.com
Mon Aug 30 06:30:29 PDT 2021
Hi all,
I'm experiencing command timeouts with recent versions of nvme-tcp driver. I have set up with a VM running 5.8 which is ok, but the same test load (100%RD 16K blocks) on a VM 5.11 or later, including 5.14-rc7, shows the same issue. The initatior complains as follows:
Aug 30 14:58:05 centos7 kernel: nvme nvme0: queue 7: timeout request 0x10 type 4
Aug 30 14:58:05 centos7 kernel: nvme nvme0: starting error recovery
My target is a Kalray target, but I have no trace of any outstanding commands when this situation occurs. Quite the opposite: The Kalray board observes that the initiator stops sending new requests on the queue some time before this command times out.
I don't have any similar issues using SPDK as an initiator. I made the following modification to nvme-tcp code and my problem has disappeared:
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -287,6 +287,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
* directly, otherwise queue io_work. Also, only do that if we
* are on the same cpu, so we don't introduce contention.
*/
+#if 0
if (queue->io_cpu == raw_smp_processor_id() &&
sync && empty && mutex_trylock(&queue->send_mutex)) {
queue->more_requests = !last;
@@ -296,6 +297,9 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
} else if (last) {
queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
}
+#else
+ queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+#endif
}
To be honest, this bit of code has always bothered me: I don't understand how we can guarantee that the thread doesn't change CPU between the call to raw_smp_processor_id() and the trylock. I assume that even if this does occur, the fact that we hold queue->send_mutex() is supposed to make sure that there are no atomicity issues, but I'm concerned about the logic that tries to determine if there is any more pending work and reschedule io_work if and only if there is something to do. I can't pinpoint an issue from just reading the code though..
Does anyone have any thoughts on this? Is my patch proof that there is an issue in this code, or am I likely to be changing the program flow so significantly that we can't conclude? Are there any open issues wrt command timeouts at the moment?
Thanks in advance for any help
Sam
Samuel Jones
Datacenter SW Development Manager • Kalray
Phone:
sjones at kalrayinc.com • [ https://www.kalrayinc.com/ | www.kalrayinc.com ]
[ https://www.kalrayinc.com/ | ]
Intelligent Data Processing
>From Cloud to Edge
Please consider the environment before printing this e-mail.
This message contains information that may be privileged or confidential and is the property of Kalray S.A. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
More information about the Linux-nvme
mailing list