lockdep WARNING at blktests block/011

Tue Oct 11 10:11:54 PDT 2022

On 10/10/22 06:31, Keith Busch wrote:
> On Fri, Oct 7, 2022 at 10:34 PM Bart Van Assche <bvanassche at acm.org> wrote:
>>
>> On 10/3/22 08:28, Keith Busch wrote:
>>> On Mon, Oct 03, 2022 at 01:32:41PM +0000, Shinichiro Kawasaki wrote:
>>>>
>>>> BTW, I came up to another question during code read. I found nvme_reset_work()
>>>> calls nvme_dev_disable() before nvme_sync_queues(). So, I think the NVME
>>>> controller is already disabled when the reset work calls nvme_sync_queues().
>>>
>>> Right, everything previously outstanding has been reclaimed, and the queues are
>>> quiesced at this point. There's nothing for timeout work to wait for, and the
>>> sync is just ensuring every timeout work has returned.
>>>
>>> It looks like a timeout is required in order to hit this reported deadlock, but
>>> the driver ensures there's nothing to timeout prior to syncing the queues. I
>>> don't think lockdep could reasonably know that, though.
>>
>> Hi Keith,
>>
>> Commit b2a0eb1a0ac7 ("nvme-pci: Remove watchdog timer") introduced the
>> nvme_dev_disable() and nvme_reset_ctrl() calls in the nvme_timeout()
>> function. Has it been considered to invoke these two calls asynchronously
>> instead of synchronously from the NVMe timeout handler (queue_work())?
>> Although it may require some work to make sure that this approach does not
>> trigger any race conditions, do you agree that this should be sufficient to
>> make lockdep happy?
> 
> We still have to sync whatever work does the reset, so that would just
> shift which work the lockdep splat indicates.

Hi Keith,

It seems like my email was not clear enough? What I meant is to queue
asynchronous work from inside the timeout handler and to wait for the
completion of that work from *outside* the timeout handler. This is not a
new approach. As an example, the SCSI core queues abort work from inside the
timeout handler and only allows new SCSI commands to be queued after error
handling has finished. I'm not claiming that this approach should be followed
by the NVMe driver - I'm only mentioning this as an example.

Thanks,

Bart.