[PATCH v2 RFC 6/6] nvme-core: fix deadlock in disconnect during scan_work and/or ana_work

Christoph Hellwig hch at lst.de
Wed Jun 24 02:43:09 EDT 2020


On Tue, Jun 23, 2020 at 05:18:53PM -0700, Sagi Grimberg wrote:
> From: Anton Eidelman <anton at lightbitslabs.com>
> 
> A deadlock happens in the following scenario with multipath:
> 1) scan_work(nvme0) detects a new nsid while nvme0
>     is an optimized path to it, path nvme1 happens to be
>     inaccessible.
> 
> 2) Before scan_work is complete nvme0 disconnect is initiated
>     nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING
> 
> 3) scan_work(1) attempts to submit IO,
>     but nvme_path_is_optimized() observes nvme0 is not LIVE.
>     Since nvme1 is a possible path IO is requeued and scan_work hangs.

I'm really worried about another flag outside the state machine.  If
we really need a multi-step deletion we should have
NVME_CTRL_DELETE_START, NVME_CTRL_DELETE_CONT or so states and run
this via the state machine.



More information about the Linux-nvme mailing list