[PATCH 09/13] block: introduce blkdev_report_zones_cached()

Bart Van Assche bvanassche at acm.org
Fri Oct 31 14:53:00 PDT 2025


On 10/30/25 11:13 PM, Damien Le Moal wrote:
> Introduce the function blkdev_report_zones_cached() to provide a fast
> report zone built using the blkdev_get_zone_info() function, which gets
> zone information from a disk zones_cond array or zone write plugs.
> For a large capacity SMR drive, such fast report zone can be completed
> in a few millioseconds compared to several seconds completion times
> when the report zone is obtained from the device.

millioseconds -> milliseconds

Does retrieving the cached zone information really require multiple
milliseconds instead of only a few microseconds?
> For zoned device that do not use zone write plug resources,

zoned device -> zoned devices

> +static inline bool disk_need_zone_resources(struct gendisk *disk)
> +{
> +	/*
> +	 * All mq zoned devices need zone resources so that the block layer
> +	 * can automatically handle write BIO plugging. BIO-based device drivers
> +	 * (e.g. DM devices) are normally responsible for handling zone write
> +	 * ordering and do not need zone resources, unless the driver requires
> +	 * zone append emulation.
> +	 */
> +	return queue_is_mq(disk->queue) ||
> +		queue_emulates_zone_append(disk->queue);
> +}

Today queue_is_mq() returns true for request-based queues only. Since
this is the terminology used elsewhere in the block layer, maybe change 
"mq zoned devices" into "request-based zoned block devices"?

>   static inline unsigned int disk_zone_wplugs_hash_size(struct gendisk *disk)
>   {
>   	return 1U << disk->zone_wplugs_hash_bits;
> @@ -962,6 +975,68 @@ int blkdev_get_zone_info(struct block_device *bdev, sector_t sector,
>   }
>   EXPORT_SYMBOL_GPL(blkdev_get_zone_info);
>   
> +/**
> + * blkdev_report_zones_cached - Get cached zones information
> + * @bdev:     Target block device
> + * @sector:   Sector from which to report zones
> + * @nr_zones: Maximum number of zones to report
> + * @cb:       Callback function called for each reported zone
> + * @data:     Private data for the callback function
> + *
> + * Description:
> + *    Similar to blkdev_report_zones() but instead of calling into the low level
> + *    device driver to get the zone report from the device, use
> + *    blkdev_get_zone_info() to generate the report from the disk zone write
> + *    plugs and zones condition array. Since calling this function without a
> + *    callback does not make sense, @cb must be specified.
> + */
> +int blkdev_report_zones_cached(struct block_device *bdev, sector_t sector,
> +			unsigned int nr_zones, report_zones_cb cb, void *data)
> +{
> +	struct gendisk *disk = bdev->bd_disk;
> +	sector_t capacity = get_capacity(disk);
> +	sector_t zone_sectors = bdev_zone_sectors(bdev);
> +	unsigned int idx = 0;
> +	struct blk_zone zone;
> +	int ret;
> +
> +	if (!cb || !bdev_is_zoned(bdev) ||
> +	    WARN_ON_ONCE(!disk->fops->report_zones))
> +		return -EOPNOTSUPP;
> +
> +	if (!nr_zones || sector >= capacity)
> +		return 0;
> +
> +	/*
> +	 * If we do not have any zone write plug resources, fallback to using
> +	 * the regular zone report.
> +	 */
> +	if (!disk_need_zone_resources(disk)) {
> +		struct blk_report_zones_args args = {
> +			.cb = cb,
> +			.data = data,
> +			.report_active = true,
> +		};
> +
> +		return blkdev_do_report_zones(bdev, sector, nr_zones, &args);
> +	}
> +
> +	for (sector = ALIGN(sector, zone_sectors);
> +	     sector < capacity && idx < nr_zones;
> +	     sector += zone_sectors, idx++) {

Please change "sector = ALIGN(sector, zone_sectors)" into an something
based on bdev_offset_from_zone_start(), e.g. the following code:

	sector += zone_sectors - 1;
	sector -= bdev_offset_from_zone_start(bdev, sector);

Thanks,

Bart.



More information about the Linux-nvme mailing list