[PATCH v2 1/2] blk-mq: add tagset quiesce interface

Paul E. McKenney paulmck at kernel.org
Mon Oct 17 08:21:36 PDT 2022


On Mon, Oct 17, 2022 at 03:39:06PM +0200, Christoph Hellwig wrote:
> On Thu, Oct 13, 2022 at 05:44:49PM +0800, Chao Leng wrote:
> > +	rcu = kvmalloc(count * sizeof(*rcu), GFP_KERNEL);
> > +	if (rcu) {
> > +		list_for_each_entry(q, &set->tag_list, tag_set_list) {
> > +			if (blk_queue_noquiesced(q))
> > +				continue;
> > +
> > +			init_rcu_head(&rcu[i].head);
> > +			init_completion(&rcu[i].completion);
> > +			call_srcu(q->srcu, &rcu[i].head, wakeme_after_rcu);
> > +			i++;
> > +		}
> > +
> > +		for (i = 0; i < count; i++) {
> > +			wait_for_completion(&rcu[i].completion);
> > +			destroy_rcu_head(&rcu[i].head);
> > +		}
> > +		kvfree(rcu);
> > +	} else {
> > +		list_for_each_entry(q, &set->tag_list, tag_set_list)
> > +			synchronize_srcu(q->srcu);
> > +	}
> 
> Having to allocate a struct rcu_synchronize for each of the potentially
> many queues here is a bit sad.
> 
> Pull just explained the start_poll_synchronize_rcu interfaces at ALPSS
> last week, so I wonder if something like that would also be feasible
> for SRCU, as that would come in really handy here.

There is start_poll_synchronize_srcu() and poll_state_synchronize_srcu(),
but there would need to be an unsigned long for each srcu_struct from
which an SRCU grace period was required.  This would be half the size
of the "rcu" array above, but still maybe larger than you would like.

The resulting code might look something like this, with "rcu" now being
a pointer to unsigned long:

	rcu = kvmalloc(count * sizeof(*rcu), GFP_KERNEL);
	if (rcu) {
		list_for_each_entry(q, &set->tag_list, tag_set_list) {
			if (blk_queue_noquiesced(q))
				continue;
			rcu[i] = start_poll_synchronize_srcu(q->srcu);
			i++;
		}

		for (i = 0; i < count; i++)
			if (!poll_state_synchronize_srcu(q->srcu))
				synchronize_srcu(q->srcu);
		kvfree(rcu);
	} else {
		list_for_each_entry(q, &set->tag_list, tag_set_list)
			synchronize_srcu(q->srcu);
	}

Or as Christoph suggested, just have a single srcu_struct for the
whole group.

The main reason for having multiple srcu_struct structures is to
prevent the readers from one from holding up the updaters from another.
Except that by waiting for the multiple grace periods, you are losing
that property anyway, correct?  Or is this code waiting on only a small
fraction of the srcu_struct structures associated with blk_queue?

							Thanx, Paul



More information about the Linux-nvme mailing list