[PATCH v2 1/2] blk-mq: add tagset quiesce interface

Sagi Grimberg sagi at grimberg.me
Wed Oct 19 00:15:26 PDT 2022


>>> Then the big question is "how long do the SRCU readers run?"
>>>
>>> If all of the readers ran for exactly the same duration, there would be
>>> little point in having more than one srcu_struct.
>>
>> The SRCU readers are the I/O dispatch, which will have quite similar
>> runtimes for the different queues.
>>
>>> If the kernel knew up front how long the SRCU readers for a given entry
>>> would run, it could provide an srcu_struct structure for each duration.
>>> For a (fanciful) example, you could have one srcu_struct structure for
>>> SSDs, another for rotating rust, a third for LAN-attached storage, and
>>> a fourth for WAN-attached storage.  Maybe a fifth for lunar-based storage.
>>
>> All the different request_queues in a tag_set are for the same device.
>> There might be some corner cases like storare arrays where they have
>> different latencies.  But we're not even waiting for the I/O completion
>> here, this just protects the submission.
>>
>>> Does that help, or am I off in the weeds here?
>>
>> I think this was very helpful, and at least to be moving the srcu_struct
>> to the tag_set sounds like a good idea to explore.
>>
>> Ming, anything I might have missed?
> 
> I think it is fine to move it to tag_set, this way could simplify a
> lot.
> 
> The effect could be that blk_mq_quiesce_queue() becomes a little
> slow, but it is always in slow path, and synchronize_srcu() won't
> wait new read-side critical-section.
> 
> Just one corner case, in case of BLK_MQ_F_BLOCKING, is there any such driver
> which may block too long in ->queue_rq() sometime? such as wait for dozens
> of seconds.

nvme-tcp will opportunistically try a network send directly from
.queue_rq(), but always with MSG_DONTWAIT, so that is not a problem.

nbd though can block in .queue_rq() with blocking network sends, however
afaict nbd allocates a tagset per nbd_device, and also a request_queue
per device, so its effectively the same if the srcu is in the tagset or
in the request_queue.



More information about the Linux-nvme mailing list