nvmet_sq_destroy stuck forever when data digest is turned on

Grupi, Elad Elad.Grupi at dell.com
Tue Sep 19 04:52:03 PDT 2023


Hi

I have an issue with nvmet_tcp_release_queue_work hitting hung task after 2 minutes of waiting for nvmet_sq_destroy.
This issue reproduces only when data digest is on.

I am inspecting the code of nvmet_tcp_release_queue_work and I see that the code handles 'data in' commands
This means that it calls nvmet_req_uninit for any command that its data is still in transit.

There might be commands that the data transfer is already done, but data digest was not received from socket yet (aka rcv_state is NVMET_TCP_RECV_DDGST)
The data digest will never be read from the socket because the socket is blocked by NVMET_TCP_RECV_ERR
Hence nvmet_sq_destroy will be stuck forever waiting for nvmet_tcp_try_recv_ddgst to execute.

Can you suggest a fix for such an issue?

Thanks,
Elad


Internal Use - Confidential


More information about the Linux-nvme mailing list