[PATCH] nvme: fix SRCU protection of nvme_ns_head list
Sagi Grimberg
sagi at grimberg.me
Tue Nov 22 07:08:34 PST 2022
>> 3. removes ns from head sibling list + synchronize rcu
>> -> this should fence non-sleeping traversals (like revalidate_paths)
>
> Well, non-sleeping would only matter if those non-sleeping travesals
> are under rcu_read_lock(), but they are not. They are either part of
> a longer srcu critical section because other code can sleep, or in
> case of revalidate_paths unprotected at all (which this patch fixes).
The original patch comment was that rcu_read_lock/unlock would be
sufficient and we don't need to touch nvme_ns_remove()
>
>> Maybe it is OK to have it also srcu locked and just accept that
>> nshead sibling list is srcu protected. In that case, your patch
>> needs to extend the srcu also the clearing of current_head pointer.
>
> I don't see how nvme_mpath_clear_current_path needs (S)RCU protection.
> It never dereferences the current_path, it just checks is for pointer
> equality and if they match clears it to NULL. (I wonder if it should
> use cmpxchg though).
Agree. it can stay out. because at this point it does not compete with
concurrent submissions due to prior synchronizations. The list traversal
needs to be under rcu lock.
>
>> But looking again at your bug report, you mention that there are
>> concurrent scans, one removing the ns and another accessing it.
>> That cannot happen due to the scan_lock held around this section afaict.
>>
>> I guess it can be that in general ns removal can compete with a scan
>> if due to some controller behavior that failed an identify command
>> transiently in a prior scan, and a subsequent scan finds it? worth
>> pinning down exactly what happened in the race you got because maybe we
>> have a different issue that may manifest in other issues.
>
> So scanning itself should be single threaded as it only happens from
> the workqueue. But nvme_ns_remove can be called from
> nvme_remove_namespaces in in 6.1 and earlier from the passthrough
> handler.
The original patch report did not include any sequence that removes all
namespaces, and given that it came from RockyLinux 8.6 kernel, it is not
6.1... Hence I think that we need to understand how a namespace removal
happened at the same time that the namespace is being scanned. Maybe
something else is broken.
More information about the Linux-nvme
mailing list