[PATCH v2] nvme: tcp: avoid race between queue_lock lock and destroy

Thu Oct 3 13:13:56 PDT 2024

On Wed, Oct 02, 2024 at 01:51:41PM +0900, Shin'ichiro Kawasaki wrote:
> From: Hannes Reinecke <hare at suse.de>
> 
> Commit 76d54bf20cdc ("nvme-tcp: don't access released socket during
> error recovery") added a mutex_lock() call for the queue->queue_lock
> in nvme_tcp_get_address(). However, the mutex_lock() races with
> mutex_destroy() in nvme_tcp_free_queue(), and causes the WARN below.

<snip>

> The WARN is observed when the blktests test case nvme/014 is repeated
> with tcp transport. It is rare, and 200 times repeat is required to
> recreate in some test environments.
> 
> To avoid the WARN, check the NVME_TCP_Q_LIVE flag before locking
> queue->queue_lock. The flag is cleared long time before the lock gets
> destroyed.

I've applied this to nvme-6.12, but the existence of this queue_lock
seems strange. It looks like tcp is relying on blk-mq's timeout to
individually complete requests after the queue is stopped, but I feel
like there should be a way to complete everying in a single batch. We
have the generic nvme_cancel_request() for this reason, but fabrics has
it's own other way to do it once at a time?