nvmet-tcp: dead lock on host disconnect

Grupi, Elad Elad.Grupi at dell.com
Wed Oct 20 08:41:20 PDT 2021


Hi

We are facing an issue that leads to dead lock in nvmet-tcp layer.
Our host is closing the tcp connection immediately after sending nvme-connect command, without waiting for its response.

That leads to nvmet_tcp_install_queue being blocked waiting for the inflight controller teardown Complete on flush_scheduled_work
While nvmet_tcp_release_queue_work Is blocked waiting for the sq_destroy to finish, but it never will because the connect command is stuck.

I have realized that same issue was discovered in 2018 in the rdma layer, but I don't see any patch that solved that issue.
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1795846.html

Please let me know if you have more information about this dead lock issue.

Thanks,
Elad


Internal Use - Confidential


More information about the Linux-nvme mailing list