[bug report] blktests nvme/tcp nvme/060 hang

Wed Aug 6 03:54:45 PDT 2025

On 8/6/25 08:44, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 8:22 AM CEST, Maurizio Lombardi wrote:
>> On Wed Aug 6, 2025 at 8:16 AM CEST, Maurizio Lombardi wrote:
>>>
>>> I think that the problem is due to the fact that nvmet_tcp_data_ready()
>>> calls the queue->data_ready() callback with the sk_callback_lock
>>> locked.
>>> The data_ready callback points to nvmet_tcp_listen_data_ready()
>>> which tries to lock the same sk_callback_lock, hence the deadlock.
>>>
>>> Maybe it can be fixed by deferring the call to queue->data_ready() by
>>> using a workqueue.
>>>
>>
>> Ops sorry they are two read locks, the real problem then is that
>> something is holding the write lock.
> 
> Ok, I think I get what happens now.
> 
> The threads that call nvmet_tcp_data_ready() (takes the read lock 2
> times) and
> nvmet_tcp_release_queue_work() (tries to take the write lock)
> are blocking each other.
> So I still think that deferring the call to queue->data_ready() by
> using a workqueue should fix it.
> 
It's nvmet_tcp_list_data_ready() which is the problem; thing is, we only
need to take the lock to access 'sk_user_data' (as this might be
while the callback is running). But the 'sk_state' value can be accessed
without a lock, and as we need to look at sk_user_data only if the
socket is in TCP_LISTEN state (which I hope is not the case during
socket shutdown) we can move the check out of the lock and avoid
this issue.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich