[PATCH v5 07/28] block: Introduce zone write plugging
Bart Van Assche
bvanassche at acm.org
Thu Apr 4 11:31:19 PDT 2024
On 4/3/24 01:42, Damien Le Moal wrote:
> diff --git a/block/bio.c b/block/bio.c
> index d24420ed1c4c..127c06443bca 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1576,6 +1576,12 @@ void bio_endio(struct bio *bio)
> if (!bio_integrity_endio(bio))
> return;
>
> + /*
> + * For write BIOs to zoned devices, signal the completion of the BIO so
> + * that the next write BIO can be submitted by zone write plugging.
> + */
> + blk_zone_write_bio_endio(bio);
> +
> rq_qos_done_bio(bio);
The function name "blk_zone_write_bio_endio()" is misleading since that
function is called for all bio operation types and not only for zoned
write bios. How about renaming blk_zone_write_bio_endio() into
blk_zone_bio_endio()? If that function is renamed the above comment is
no longer necessary in bio_endio() and can be moved to above the
blk_zone_bio_endio() definition.
> @@ -1003,6 +1007,13 @@ static enum bio_merge_status bio_attempt_front_merge(struct request *req,
> + /*
> + * A front merge for zone writes can happen only if the user submitted
> + * writes out of order. Do not attempt this to let the write fail.
> + */
"zone writes" -> "zoned writes"?
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 88b541e8873f..2c6a317bef7c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -828,6 +828,8 @@ static void blk_complete_request(struct request *req)
> bio = next;
> } while (bio);
>
> + blk_zone_write_complete_request(req);
Same comment here as above: the function name
blk_zone_write_complete_request() is misleading since that function is
called for all request types and not only for zoned writes. Please
rename blk_zone_write_complete_request() into
blk_zone_complete_request().
> + /*
> + * If the plug has a cached request for this queue, try use it.
> + */
try use it -> try to use it (I know this comes from upstream code).
> + if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
> + goto queue_exit;
The order of words in the blk_zone_write_plug_bio() function name seems
unusual to me. How about renaming that function into
blk_zone_plug_write_bio()?
> +/*
> + * Zone write plug flags bits:
Zone write -> zoned write
> + * - BLK_ZONE_WPLUG_PLUGGED: Indicate that the zone write plug is plugged,
Indicate -> Indicates
> +static bool disk_insert_zone_wplug(struct gendisk *disk,
> + struct blk_zone_wplug *zwplug)
> +{
> + struct blk_zone_wplug *zwplg;
> + unsigned long flags;
> + unsigned int idx =
> + hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits);
> +
> + /*
> + * Add the new zone write plug to the hash table, but carefully as we
> + * are racing with other submission context, so we may already have a
> + * zone write plug for the same zone.
> + */
> + spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
> + hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) {
> + if (zwplg->zone_no == zwplug->zone_no) {
> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> + return false;
> + }
> + }
> + hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]);
> + spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> +
> + return true;
> +}
The above code can be made easier to read and more compact by using
guard(spinlock_irqsave)(...) instead of spin_lock_irqsave() +
spin_unlock_irqrestore().
> +static struct blk_zone_wplug *disk_get_zone_wplug(struct gendisk *disk,
> + sector_t sector)
> +{
> + unsigned int zno = disk_zone_no(disk, sector);
> + unsigned int idx = hash_32(zno, disk->zone_wplugs_hash_bits);
> + struct blk_zone_wplug *zwplug;
> +
> + rcu_read_lock();
> +
> + hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[idx], node) {
> + if (zwplug->zone_no == zno &&
> + atomic_inc_not_zero(&zwplug->ref)) {
> + rcu_read_unlock();
> + return zwplug;
> + }
> + }
> +
> + rcu_read_unlock();
> +
> + return NULL;
> +}
The above code can also be made more compact by using guard(rcu)()
instead of rcu_read_lock() + rcu_read_unlock().
> +/*
> + * Get a reference on the write plug for the zone containing @sector.
> + * If the plug does not exist, it is allocated and hashed.
> + * Return a pointer to the zone write plug with the plug spinlock held.
> + */
Holding a spinlock upon return is not my favorite approach. Has the
following alternative been considered?
- Remove the spinlock flags argument from this function and instead add
two other arguments: prev_plug_flags and new_plug_flags.
- If a zone plug is found or allocated, copy the existing zone plug
flags into *prev_plug_flags and set the zone plug flags that have been
passed in new_plug_flags (logical or).
- From blk_revalidate_zone_cb(), pass 0 as the new_plug_flags argument.
- From blk_zone_wplug_handle_write, pass BLK_ZONE_WPLUG_PLUGGED as the
new_plug_flags argument.
Thanks,
Bart.
More information about the Linux-nvme
mailing list