[PATCH v3 07/15] block: track zone conditions

Chaitanya Kulkarni chaitanyak at nvidia.com
Mon Nov 3 20:08:38 PST 2025


On 11/3/25 17:31, Damien Le Moal wrote:
> The function blk_revalidate_zone_cond() already caches the condition of
> all zones of a zoned block device in the zones_cond array of a gendisk.
> However, the zone conditions are updated only when the device is scanned
> or revalidated.
>
> Implement tracking of the runtime changes to zone conditions using
> the new cond field in struct blk_zone_wplug. The size of this structure
> remains 112 Bytes as the new field replaces the 4 Bytes padding at the
> end of the structure.
>
> Beause zones that do not have a zone write plug can be in the empty,
> implicit open, explicit open or full condition, the zones_cond array of
> a disk is used to track the conditions, of zones that do not have a zone
> write plug. The condition of such zone is updated in the disk zones_cond
> array when a zone reset, reset all or finish operation is executed, and
> also when a zone write plug is removed from the disk hash table when the
> zone becomes full.
>
> Since a device may automatically close an implicitly open zone when
> writing to an empty or closed zone, if the total number of open zones
> has reached the device limit, the BLK_ZONE_COND_IMP_OPEN and
> BLK_ZONE_COND_CLOSED zone conditions cannot be precisely tracked. To
> overcome this, the zone condition BLK_ZONE_COND_ACTIVE is introduced to
> represent a zone that has the condition BLK_ZONE_COND_IMP_OPEN,
> BLK_ZONE_COND_EXP_OPEN or BLK_ZONE_COND_CLOSED.  This follows the
> definition of an active zone as defined in the NVMe Zoned Namespace
> specifications. As such, for a zoned device that has a limit on the
> maximum number of open zones, we will never have more zones in the
> BLK_ZONE_COND_ACTIVE condition than the device limit. This is compatible
> with the SCSI ZBC and ATA ZAC specifications for SMR HDDs as these
> devices do not have a limit on the number of active zones.
>
> The function disk_zone_wplug_set_wp_offset() is modified to use the new
> helper disk_zone_wplug_update_cond() to update a zone write plug
> condition whenever a zone write plug write offset is updated on
> submission or merging of write BIOs to a zone.
>
> The functions blk_zone_reset_bio_endio(), blk_zone_reset_all_bio_endio()
> and blk_zone_finish_bio_endio() are modified to update the condition of
> the zones targeted by reset, reset_all and finish operations, either
> using though disk_zone_wplug_set_wp_offset() for zones that have a
> zone write plug, or using the disk_zone_set_cond() helper to update the
> zones_cond array of the disk for zones that do not have a zone write
> plug.
>
> When a zone write plug is removed from the disk hash table (when the
> zone becomes empty or full), the condition of struct blk_zone_wplug is
> used to update the disk zones_cond array. Conversely, when a zone write
> plug is added to the disk hash table, the zones_cond array is used to
> initialize the zone write plug condition.
>
> Signed-off-by: Damien Le Moal<dlemoal at kernel.org>
> Reviewed-by: Christoph Hellwig<hch at lst.de>
> Reviewed-by: Johannes Thumshirn<johannes.thumshirn at wdc.com>


Looks good.

Reviewed-by: Chaitanya Kulkarni <kch at nvidia.com>

-ck





More information about the Linux-nvme mailing list