[PATCH v2 RFC 6/6] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Sagi Grimberg sagi at grimberg.me
Wed Jun 24 03:13:12 EDT 2020


>> From: Anton Eidelman <anton at lightbitslabs.com>
>>
>> A deadlock happens in the following scenario with multipath:
>> 1) scan_work(nvme0) detects a new nsid while nvme0
>>      is an optimized path to it, path nvme1 happens to be
>>      inaccessible.
>>
>> 2) Before scan_work is complete nvme0 disconnect is initiated
>>      nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
>>
>> 3) scan_work(1) attempts to submit IO,
>>      but nvme_path_is_optimized() observes nvme0 is not LIVE.
>>      Since nvme1 is a possible path IO is requeued and scan_work hangs.
> 
> I'm really worried about another flag outside the state machine.  If
> we really need a multi-step deletion we should have
> NVME_CTRL_DELETE_START, NVME_CTRL_DELETE_CONT or so states and run
> this via the state machine.

Let me look into this, I'll send out v3 of the other 5 patches, and
will follow up with this as a follow on series. Going via the state
machine is slightly more delicate, but I agree that its a better
approach.



More information about the Linux-nvme mailing list