[PATCH] nvme: fix SRCU protection of nvme_ns_head list

Christoph Hellwig hch at lst.de
Tue Nov 22 04:14:49 PST 2022


On Tue, Nov 22, 2022 at 12:06:57PM +0200, Sagi Grimberg wrote:
> I don't think rcu_mult is necessary.

It would also be really cumbersome to use.

> Lets look at nvme_ns_remove.
>
> 1. clears NVME_NS_READY + synchronize_srcu()
>    -> guarantees that nshead->current_path is not re-selecting this ns

Yes.

> 2. nvme_mpath_clear_current_path + synchronize_srcu()
>    -> if this ns happens to be the current_path -> clear it and fence
>       concurrent bio submissions

Yes.

> 3. removes ns from head sibling list + synchronize rcu
>    -> this should fence non-sleeping traversals (like revalidate_paths)

Well, non-sleeping would only matter if those non-sleeping travesals
are under rcu_read_lock(), but they are not.  They are either part of
a longer srcu critical section because other code can sleep, or in
case of revalidate_paths unprotected at all (which this patch fixes).

> Maybe it is OK to have it also srcu locked and just accept that
> nshead sibling list is srcu protected. In that case, your patch
> needs to extend the srcu also the clearing of current_head pointer.

I don't see how nvme_mpath_clear_current_path needs (S)RCU protection.
It never dereferences the current_path, it just checks is for pointer
equality and if they match clears it to NULL.  (I wonder if it should
use cmpxchg though).

> But looking again at your bug report, you mention that there are
> concurrent scans, one removing the ns and another accessing it.
> That cannot happen due to the scan_lock held around this section afaict.
>
> I guess it can be that in general ns removal can compete with a scan
> if due to some controller behavior that failed an identify command
> transiently in a prior scan, and a subsequent scan finds it? worth
> pinning down exactly what happened in the race you got because maybe we
> have a different issue that may manifest in other issues.

So scanning itself should be single threaded as it only happens from
the workqueue.  But nvme_ns_remove can be called from
nvme_remove_namespaces in in 6.1 and earlier from the passthrough
handler.



More information about the Linux-nvme mailing list