[PATCH] nvme: fix SRCU protection of nvme_ns_head list

Caleb Sander csander at purestorage.com
Mon Nov 21 11:58:53 PST 2022


On Mon, Nov 21, 2022 at 9:59 AM Paul E. McKenney <paulmck at kernel.org> wrote:
>
> On Mon, Nov 21, 2022 at 09:48:31AM -0800, Caleb Sander wrote:
> > On Sun, Nov 20, 2022 at 11:40 PM Christoph Hellwig <hch at lst.de> wrote:
> > >
> > > On Sun, Nov 20, 2022 at 01:24:51PM +0200, Sagi Grimberg wrote:
> > > >>      sector_t capacity = get_capacity(head->disk);
> > > >>      int node;
> > > >> +    int srcu_idx;
> > > >>   +  srcu_idx = srcu_read_lock(&head->srcu);
> > > >>      list_for_each_entry_rcu(ns, &head->list, siblings) {
> > > >>              if (capacity != get_capacity(ns->disk))
> > > >>                      clear_bit(NVME_NS_READY, &ns->flags);
> > > >>      }
> > > >> +    srcu_read_unlock(&head->srcu, srcu_idx);
> > > >
> > > > I don't think you need srcu here, rcu_read_lock/unlock is sufficient.
> > >
> > > So the code obviously does not sleep.  But I wonder if anything speaks
> > > against mixing SRCU and RCU protection for the same list?
> >
> > As I understand, synchronize_rcu() only synchronizes with RCU critical sections,
> > and likewise synchronize_srcu() only synchronizes with SRCU critical sections.
> > Currently nvme_ns_head_submit_bio() is using srcu_read_lock()
> > but nvme_ns_remove() is using synchronize_rcu(),
> > so there is no synchronization at all.
> > Since nvme_ns_head_submit_bio() sleeps during the critical section,
> > we definitely need to keep using SRCU there,
> > and therefore nvme_ns_remove() needs to use synchronize_srcu().
> > I agree RCU would work in nvme_mpath_revalidate_paths(), but as Paul points out,
> > nvme_ns_remove() would then need to synchronize with both SRCU and RCU.
> > This seems like it would complicate nvme_ns_remove()
> > and require waiting on readers of unrelated RCU-protected structures.
> > Is there some cost to using SRCU over RCU that I am missing?
>
> The cost is that srcu_read_lock() and srcu_read_unlock() both invoke
> smp_mb().  This might or might not be a problem for you, depending on
> how fast your fastpaths need to be.  Another cost is the need to pass
> the return value of srcu_read_lock() to srcu_read_unlock(), which is
> usually more of a software-engineering cost than it is a performance cost.

Thanks, that's great to know! I think this is probably not performance-critical,
since it's called only for changes in the connected NVMe namespaces.

> Of course, the easiest thing is to try it both ways and see if you can
> measure the difference.  If you are unsure whether or not it is safe
> to replace rcu_read_lock() with srcu_read_lock(), you can just add
> smp_mb() after each rcu_read_lock() and before each rcu_read_unlock().
> That won't be exactly, but it will get you quite close to the cost of
> srcu_read_lock() and srcu_read_unlock().
>
>                                                         Thanx, Paul

Regarding your suggestion to use synchronize_rcu_mult() to wait concurrently,
I'm not sure it's possible to write an equivalent to call_my_srrcu() here,
since the srcu_struct is per-nvme_ns_head rather than global.
I think we would probably need to synchronize with RCU and SRCU sequentially.

Thanks,
Caleb



More information about the Linux-nvme mailing list