[PATCH v5 07/28] block: Introduce zone write plugging

Bart Van Assche bvanassche at acm.org
Thu Apr 4 11:31:19 PDT 2024


On 4/3/24 01:42, Damien Le Moal wrote:
> diff --git a/block/bio.c b/block/bio.c
> index d24420ed1c4c..127c06443bca 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1576,6 +1576,12 @@ void bio_endio(struct bio *bio)
>   	if (!bio_integrity_endio(bio))
>   		return;
>   
> +	/*
> +	 * For write BIOs to zoned devices, signal the completion of the BIO so
> +	 * that the next write BIO can be submitted by zone write plugging.
> +	 */
> +	blk_zone_write_bio_endio(bio);
> +
>   	rq_qos_done_bio(bio);

The function name "blk_zone_write_bio_endio()" is misleading since that
function is called for all bio operation types and not only for zoned
write bios. How about renaming blk_zone_write_bio_endio() into 
blk_zone_bio_endio()? If that function is renamed the above comment is
no longer necessary in bio_endio() and can be moved to above the
blk_zone_bio_endio() definition.

> @@ -1003,6 +1007,13 @@ static enum bio_merge_status bio_attempt_front_merge(struct request *req,
> +	/*
> +	 * A front merge for zone writes can happen only if the user submitted
> +	 * writes out of order. Do not attempt this to let the write fail.
> +	 */

"zone writes" -> "zoned writes"?

> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 88b541e8873f..2c6a317bef7c 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -828,6 +828,8 @@ static void blk_complete_request(struct request *req)
>   		bio = next;
>   	} while (bio);
>   
> +	blk_zone_write_complete_request(req);

Same comment here as above: the function name 
blk_zone_write_complete_request() is misleading since that function is
called for all request types and not only for zoned writes. Please
rename blk_zone_write_complete_request() into
blk_zone_complete_request().

> +	/*
> +	 * If the plug has a cached request for this queue, try use it.
> +	 */

try use it -> try to use it (I know this comes from upstream code).

> +	if (blk_queue_is_zoned(q) && blk_zone_write_plug_bio(bio, nr_segs))
> +		goto queue_exit;

The order of words in the blk_zone_write_plug_bio() function name seems
unusual to me. How about renaming that function into
blk_zone_plug_write_bio()?

> +/*
> + * Zone write plug flags bits:

Zone write -> zoned write

> + *  - BLK_ZONE_WPLUG_PLUGGED: Indicate that the zone write plug is plugged,

Indicate -> Indicates

> +static bool disk_insert_zone_wplug(struct gendisk *disk,
> +				   struct blk_zone_wplug *zwplug)
> +{
> +	struct blk_zone_wplug *zwplg;
> +	unsigned long flags;
> +	unsigned int idx =
> +		hash_32(zwplug->zone_no, disk->zone_wplugs_hash_bits);
> +
> +	/*
> +	 * Add the new zone write plug to the hash table, but carefully as we
> +	 * are racing with other submission context, so we may already have a
> +	 * zone write plug for the same zone.
> +	 */
> +	spin_lock_irqsave(&disk->zone_wplugs_lock, flags);
> +	hlist_for_each_entry_rcu(zwplg, &disk->zone_wplugs_hash[idx], node) {
> +		if (zwplg->zone_no == zwplug->zone_no) {
> +			spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> +			return false;
> +		}
> +	}
> +	hlist_add_head_rcu(&zwplug->node, &disk->zone_wplugs_hash[idx]);
> +	spin_unlock_irqrestore(&disk->zone_wplugs_lock, flags);
> +
> +	return true;
> +}

The above code can be made easier to read and more compact by using
guard(spinlock_irqsave)(...) instead of spin_lock_irqsave() + 
spin_unlock_irqrestore().

> +static struct blk_zone_wplug *disk_get_zone_wplug(struct gendisk *disk,
> +						  sector_t sector)
> +{
> +	unsigned int zno = disk_zone_no(disk, sector);
> +	unsigned int idx = hash_32(zno, disk->zone_wplugs_hash_bits);
> +	struct blk_zone_wplug *zwplug;
> +
> +	rcu_read_lock();
> +
> +	hlist_for_each_entry_rcu(zwplug, &disk->zone_wplugs_hash[idx], node) {
> +		if (zwplug->zone_no == zno &&
> +		    atomic_inc_not_zero(&zwplug->ref)) {
> +			rcu_read_unlock();
> +			return zwplug;
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +
> +	return NULL;
> +}

The above code can also be made more compact by using guard(rcu)()
instead of rcu_read_lock() + rcu_read_unlock().

> +/*
> + * Get a reference on the write plug for the zone containing @sector.
> + * If the plug does not exist, it is allocated and hashed.
> + * Return a pointer to the zone write plug with the plug spinlock held.
> + */

Holding a spinlock upon return is not my favorite approach. Has the
following alternative been considered?
- Remove the spinlock flags argument from this function and instead add
   two other arguments: prev_plug_flags and new_plug_flags.
- If a zone plug is found or allocated, copy the existing zone plug
   flags into *prev_plug_flags and set the zone plug flags that have been
   passed in new_plug_flags (logical or).
- From blk_revalidate_zone_cb(), pass 0 as the new_plug_flags argument.
- From blk_zone_wplug_handle_write, pass BLK_ZONE_WPLUG_PLUGGED as the
   new_plug_flags argument.

Thanks,

Bart.



More information about the Linux-nvme mailing list