blktests failures with v7.1-rc1 kernel

Mon May 25 05:44:14 PDT 2026

hi Shinichiro,

On 4/28/26 2:43 PM, Shin'ichiro Kawasaki wrote:
> Hi all,
> 
> I ran the latest blktests (git hash: ea5472c1adc8) with the v7.1-rc1 kernel. I
> observed 8 failures listed below. Comparing with the previous report for the
> v7.0 kernel [1], 2 failures are new (nvme/045, scsi/002). Your actions for fix
> will be welcomed as always.
> 
> [1]https://lore.kernel.org/linux-block/aeCDXI5hY_ivSWm4@shinmob/
> 
> 
> List of failures
> ================
> #1: nvme/005,063 (tcp transport)
> #2: nvme/045 (new)(kmemleak)
> #3: nvme/058 (fc transport)(hang)(kmemleak)
> #4: nvme/060
> #5: nvme/061 (rdma transport, siw driver)(kmemleak)
> #6: nvme/061 (fc transport)
> #7: nbd/002
> #8: scsi/002 (new)
> 
> 
> Failure description
> ===================
> 
> #1: nvme/005,063 (tcp transport)
> 
>      The test cases nvme/005 and 063 fail for tcp transport due to the lockdep
>      WARN related to the three locks q->q_usage_counter, q->elevator_lock and
>      set->srcu. The failure was reported first time for nvme/063 and v6.16-rc1
>      kernel [2].
> 
>      Chaitanya provided a fix patch (thanks!), and it is queued for v7.1-rcX tags
>      [3]. However, nvme/005 and 063 still fail even when I apply the fix patch to
>      v7.1-rc1 kernel. The call traces of the lockdep WARN are different between
>      "v7.1-rc1" kernel [4] and "v7.1-rc1+the fix patch" kernel [5]. I guess that
>      there exist two lockdep problems with similar symptoms and patch [3] fixed
>      one of them. I guess that still one problem is left.
> 
>      [2]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>      [3]https://lore.kernel.org/all/20260413171628.6204-1-kch@nvidia.com/

I looked into this lockdep warning, and it seems that Chaitanya's patch indeed fixes the
original issue reported in [4]. However, the new warning reported in [5] appears to be a
separate lockdep splat and, from what I can tell, likely a false positive. There are two
reasons why I think so:

1. The lockdep report suggests that thread #1 is sending data over a TCP socket while
    another thread #2 is still in the process of establishing that same socket connection.
    In practice, this should not be possible because request dispatch over the socket can
    only happen after the connection setup has completed successfully.

2. The warning also suggests that while thread #0 is deleting the gendisk and unregistering
    the corresponding request queue, another thread #5 is concurrently attempting to change
    the queue elevator. However, once gendisk deletion starts, elevator switching is already
    inhibited for that queue (see disable_elv_switch()), so the reported locking scenario
    should not be reachable in practice.

Based on the above, I suspect this is a lockdep false positive caused by dependency tracking
across different queue/socket lifecycle phases. We may need to suppress lock dependency tracking
in some of these paths to avoid the false warning.

Thanks,
--Nilay