[PATCH 07/13] block: track zone conditions

Damien Le Moal dlemoal at kernel.org
Mon Nov 3 14:34:30 PST 2025


On 11/4/25 03:31, Bart Van Assche wrote:
> On 11/3/25 7:48 AM, Bart Van Assche wrote:
>> On 11/2/25 10:05 PM, Damien Le Moal wrote:
>>> On 11/1/25 06:17, Bart Van Assche wrote:
>>>> On 10/30/25 11:13 PM, Damien Le Moal wrote:
>>>>> Implement tracking of the runtime changes to zone conditions using
>>>>> the new cond field in struct blk_zone_wplug. The size of this structure
>>>>> remains 112 Bytes as the new field replaces the 4 Bytes padding at the
>>>>> end of the structure. For zones that do not have a zone write plug, the
>>>>> zones_cond array of a disk is used to track changes to zone conditions,
>>>>> e.g. when a zone reset, reset all or finish operation is executed.
>>>>
>>>> Why is it necessary to track the condition of sequential zones that do
>>>> not have a zone write plug? Please explain what the use cases are.
>>>
>>> Because zones that do not have a zone write plug can be empty OR full.
>>
>> Why does the block layer have to track this information? Filesystems can
>> easily derive this information from the filesystem metadata information,
>> isn't it?
> 
> (replying to my own email)
> 
> Is this a good way to check what zone type information filesystems need?
> 
> $ git grep -nH BLK_ZONE_TYPE_ fs
> fs/btrfs/zoned.c:96:		ASSERT(zones[i].type != BLK_ZONE_TYPE_CONVENTIONAL);
> fs/btrfs/zoned.c:211:		zones[i].type = BLK_ZONE_TYPE_CONVENTIONAL;
> fs/btrfs/zoned.c:488:			if (zones[i].type == BLK_ZONE_TYPE_SEQWRITE_REQ)
> fs/btrfs/zoned.c:566:		    BLK_ZONE_TYPE_CONVENTIONAL)
> fs/btrfs/zoned.c:815:	if (zones[0].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> fs/btrfs/zoned.c:1360:	if (unlikely(zone.type == 
> BLK_ZONE_TYPE_CONVENTIONAL)) {
> fs/f2fs/segment.c:5295:	if (zone->type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> fs/f2fs/segment.c:5417:	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> fs/f2fs/segment.c:5473:	if (zone.type != BLK_ZONE_TYPE_SEQWRITE_REQ)
> fs/f2fs/super.c:4332:	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> fs/xfs/libxfs/xfs_zones.c:177:	case BLK_ZONE_TYPE_CONVENTIONAL:
> fs/xfs/libxfs/xfs_zones.c:179:	case BLK_ZONE_TYPE_SEQWRITE_REQ:
> fs/zonefs/super.c:385:		zone.type = BLK_ZONE_TYPE_CONVENTIONAL;
> fs/zonefs/super.c:874:	case BLK_ZONE_TYPE_CONVENTIONAL:
> fs/zonefs/super.c:886:	case BLK_ZONE_TYPE_SEQWRITE_REQ:
> fs/zonefs/super.c:887:	case BLK_ZONE_TYPE_SEQWRITE_PREF:
> fs/zonefs/zonefs.h:26: * defined in linux/blkzoned.h, that is, 
> BLK_ZONE_TYPE_SEQWRITE_REQ and
> fs/zonefs/zonefs.h:27: * BLK_ZONE_TYPE_SEQWRITE_PREF.
> fs/zonefs/zonefs.h:37:	if (zone->type == BLK_ZONE_TYPE_CONVENTIONAL)
> 
> In the above I see that all filesystems check for the following zone
> types and don't check whether a zone is empty or full:
> * CONVENTIONAL
> * SEQWRITE_REQ
> * SEQWRITE_PREF
> 
> Do you agree with this conclusion?

Absolutely not.

git grep -nH BLK_ZONE_COND_ fs
fs/btrfs/zoned.c:75:    return (zone->cond == BLK_ZONE_COND_FULL) ||
fs/btrfs/zoned.c:97:            empty[i] = (zones[i].cond == BLK_ZONE_COND_EMPTY);
fs/btrfs/zoned.c:212:           zones[i].cond = BLK_ZONE_COND_NOT_WP;
fs/btrfs/zoned.c:491:                   case BLK_ZONE_COND_EMPTY:
fs/btrfs/zoned.c:494:                   case BLK_ZONE_COND_IMP_OPEN:
fs/btrfs/zoned.c:495:                   case BLK_ZONE_COND_EXP_OPEN:
fs/btrfs/zoned.c:496:                   case BLK_ZONE_COND_CLOSED:
fs/btrfs/zoned.c:497:                   case BLK_ZONE_COND_ACTIVE:
fs/btrfs/zoned.c:833:           if (reset && reset->cond != BLK_ZONE_COND_EMPTY) {
fs/btrfs/zoned.c:845:                   reset->cond = BLK_ZONE_COND_EMPTY;
fs/btrfs/zoned.c:967:           if (zone->cond == BLK_ZONE_COND_FULL) {
fs/btrfs/zoned.c:972:           if (zone->cond == BLK_ZONE_COND_EMPTY)
fs/btrfs/zoned.c:973:                   zone->cond = BLK_ZONE_COND_IMP_OPEN;
fs/btrfs/zoned.c:1000:                  zone->cond = BLK_ZONE_COND_FULL;
fs/btrfs/zoned.c:1373:  case BLK_ZONE_COND_OFFLINE:
fs/btrfs/zoned.c:1374:  case BLK_ZONE_COND_READONLY:
fs/btrfs/zoned.c:1381:  case BLK_ZONE_COND_EMPTY:
fs/btrfs/zoned.c:1384:  case BLK_ZONE_COND_FULL:
fs/f2fs/segment.c:5319: if ((!valid_block_cnt && zone->cond ==
BLK_ZONE_COND_EMPTY) ||
fs/f2fs/segment.c:5320:     (valid_block_cnt && zone->cond == BLK_ZONE_COND_FULL))
fs/xfs/libxfs/xfs_zones.c:93:   case BLK_ZONE_COND_EMPTY:
fs/xfs/libxfs/xfs_zones.c:95:   case BLK_ZONE_COND_IMP_OPEN:
fs/xfs/libxfs/xfs_zones.c:96:   case BLK_ZONE_COND_EXP_OPEN:
fs/xfs/libxfs/xfs_zones.c:97:   case BLK_ZONE_COND_CLOSED:
fs/xfs/libxfs/xfs_zones.c:99:   case BLK_ZONE_COND_FULL:
fs/xfs/libxfs/xfs_zones.c:101:  case BLK_ZONE_COND_NOT_WP:
fs/xfs/libxfs/xfs_zones.c:102:  case BLK_ZONE_COND_OFFLINE:
fs/xfs/libxfs/xfs_zones.c:103:  case BLK_ZONE_COND_READONLY:
fs/xfs/libxfs/xfs_zones.c:122:  case BLK_ZONE_COND_NOT_WP:
fs/xfs/xfs_zone_alloc.c:985:    if (!zone || zone->cond == BLK_ZONE_COND_NOT_WP) {
fs/zonefs/super.c:195:  case BLK_ZONE_COND_OFFLINE:
fs/zonefs/super.c:200:  case BLK_ZONE_COND_READONLY:
fs/zonefs/super.c:215:  case BLK_ZONE_COND_FULL:
fs/zonefs/super.c:386:          zone.cond = BLK_ZONE_COND_NOT_WP;
fs/zonefs/super.c:986:                          if (next->cond ==
BLK_ZONE_COND_READONLY &&
fs/zonefs/super.c:987:                              zone->cond !=
BLK_ZONE_COND_OFFLINE)
fs/zonefs/super.c:988:                                  zone->cond =
BLK_ZONE_COND_READONLY;
fs/zonefs/super.c:989:                          else if (next->cond ==
BLK_ZONE_COND_OFFLINE)
fs/zonefs/super.c:990:                                  zone->cond =
BLK_ZONE_COND_OFFLINE;
fs/zonefs/super.c:1034:             (zone->cond == BLK_ZONE_COND_IMP_OPEN ||
fs/zonefs/super.c:1035:              zone->cond == BLK_ZONE_COND_EXP_OPEN)) {

And if you are still not convinced, read the mount code for XFS and BTRFS.
You'll see the point of having a fast cached report zones to speed that up.


-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list