[PATCH] nvme: fix SRCU protection of nvme_ns_head list
Christoph Hellwig
hch at lst.de
Tue Nov 22 04:14:49 PST 2022
On Tue, Nov 22, 2022 at 12:06:57PM +0200, Sagi Grimberg wrote:
> I don't think rcu_mult is necessary.
It would also be really cumbersome to use.
> Lets look at nvme_ns_remove.
>
> 1. clears NVME_NS_READY + synchronize_srcu()
> -> guarantees that nshead->current_path is not re-selecting this ns
Yes.
> 2. nvme_mpath_clear_current_path + synchronize_srcu()
> -> if this ns happens to be the current_path -> clear it and fence
> concurrent bio submissions
Yes.
> 3. removes ns from head sibling list + synchronize rcu
> -> this should fence non-sleeping traversals (like revalidate_paths)
Well, non-sleeping would only matter if those non-sleeping travesals
are under rcu_read_lock(), but they are not. They are either part of
a longer srcu critical section because other code can sleep, or in
case of revalidate_paths unprotected at all (which this patch fixes).
> Maybe it is OK to have it also srcu locked and just accept that
> nshead sibling list is srcu protected. In that case, your patch
> needs to extend the srcu also the clearing of current_head pointer.
I don't see how nvme_mpath_clear_current_path needs (S)RCU protection.
It never dereferences the current_path, it just checks is for pointer
equality and if they match clears it to NULL. (I wonder if it should
use cmpxchg though).
> But looking again at your bug report, you mention that there are
> concurrent scans, one removing the ns and another accessing it.
> That cannot happen due to the scan_lock held around this section afaict.
>
> I guess it can be that in general ns removal can compete with a scan
> if due to some controller behavior that failed an identify command
> transiently in a prior scan, and a subsequent scan finds it? worth
> pinning down exactly what happened in the race you got because maybe we
> have a different issue that may manifest in other issues.
So scanning itself should be single threaded as it only happens from
the workqueue. But nvme_ns_remove can be called from
nvme_remove_namespaces in in 6.1 and earlier from the passthrough
handler.
More information about the Linux-nvme
mailing list