[PATCH v5 1/2] blk-mq: add tagset quiesce interface
Paul E. McKenney
paulmck at kernel.org
Tue Jul 28 20:31:24 EDT 2020
On Tue, Jul 28, 2020 at 04:46:23PM -0700, Sagi Grimberg wrote:
> Hey Paul,
>
> > Indeed you cannot. And if you build with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
> > it will yell at you when you try.
> >
> > You -can- pass on-stack rcu_head structures to call_srcu(), though,
> > if that helps. You of course must have some way of waiting for the
> > callback to be invoked before exiting that function. This should be
> > easy for me to package into an API, maybe using one of the existing
> > reference-counting APIs.
> >
> > So, do you have a separate stack frame for each of the desired call_srcu()
> > invocations? If not, do you know at build time how many rcu_head
> > structures you need? If the answer to both of these is "no", then
> > it is likely that there needs to be an rcu_head in each of the relevant
> > data structures, as was noted earlier in this thread.
> >
> > Yeah, I should go read the code. But I would need to know where it is
> > and it is still early in the morning over here! ;-)
> >
> > I probably should also have read the remainder of the thread before
> > replying, as well. But what is the fun in that?
>
> The use-case is to quiesce submissions to queues. This flow is where we
> want to teardown stuff, and we can potentially have 1000's of queues
> that we need to quiesce each one.
>
> each queue (hctx) has either rcu or srcu depending if it may sleep
> during submission.
>
> The goal is that the overall quiesce should be fast, so we want
> to wait for all of these queues elapsed period ~once, in parallel,
> instead of synchronizing each serially as done today.
>
> The guys here are resisting to add a rcu_synchronize to each and
> every hctx because it will take 32 bytes more or less from 1000's
> of hctxs.
>
> Dynamically allocating each one is possible but not very scalable.
>
> The question is if there is some way, we can do this with on-stack
> or a single on-heap rcu_head or equivalent that can achieve the same
> effect.
If the hctx structures are guaranteed to stay put, you could count
them and then do a single allocation of an array of rcu_head structures
(or some larger structure containing an rcu_head structure, if needed).
You could then sequence through this array, consuming one rcu_head per
hctx as you processed it. Once all the callbacks had been invoked,
it would be safe to free the array.
Sounds too simple, though. So what am I missing?
Thanx, Paul
More information about the Linux-nvme
mailing list