don't reorder requests passed to ->queue_rqs
Bart Van Assche
bvanassche at acm.org
Wed Nov 13 10:33:46 PST 2024
On 11/13/24 7:20 AM, Christoph Hellwig wrote:
> currently blk-mq reorders requests when adding them to the plug because
> the request list can't do efficient tail appends. When the plug is
> directly issued using ->queue_rqs that means reordered requests are
> passed to the driver, which can lead to very bad I/O patterns when
> not corrected, especially on rotational devices (e.g. NVMe HDD) or
> when using zone append.
>
> This series first adds two easily backportable workarounds to reverse
> the reording in the virtio_blk and nvme-pci ->queue_rq implementations
> similar to what the non-queue_rqs path does, and then adds a rq_list
> type that allows for efficient tail insertions and uses that to fix
> the reordering for real and then does the same for I/O completions as
> well.
Hi Christoph,
Could something like the patch below replace this patch series? I don't
have a strong opinion about which approach to select.
I'm sharing this patch because this is what I came up while looking into
how to support QD>1 for zoned devices with a storage controller that
preserves the request order (UFS).
Thanks,
Bart.
block: Make the plugging mechanism preserve the request order
Requests are added to the front of the plug list and dispatching happens
in list order. Hence, dispatching happens in reverse order. Dispatch in
order by reversing the plug list before dispatching. This patch is a
modified version of a patch from Jens
(https://lore.kernel.org/linux-block/1872ae0a-6ba6-45f5-9f3d-8451ce06eb14@kernel.dk/).
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3533bd808072..bf2ea421b2e8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2868,6 +2868,22 @@ static void blk_mq_dispatch_plug_list(struct
blk_plug *plug, bool from_sched)
percpu_ref_put(&this_hctx->queue->q_usage_counter);
}
+/* See also llist_reverse_order(). */
+static void blk_plug_reverse_order(struct blk_plug *plug)
+{
+ struct request *rq = plug->mq_list, *new_head = NULL;
+
+ while (rq) {
+ struct request *tmp = rq;
+
+ rq = rq->rq_next;
+ tmp->rq_next = new_head;
+ new_head = tmp;
+ }
+
+ plug->mq_list = new_head;
+}
+
void blk_mq_flush_plug_list(struct blk_plug *plug, bool from_schedule)
{
struct request *rq;
@@ -2885,6 +2901,8 @@ void blk_mq_flush_plug_list(struct blk_plug *plug,
bool from_schedule)
depth = plug->rq_count;
plug->rq_count = 0;
+ blk_plug_reverse_order(plug);
+
if (!plug->multiple_queues && !plug->has_elevator && !from_schedule) {
struct request_queue *q;
More information about the Linux-nvme
mailing list