[PATCH v2 11/15] block: introduce BLKREPORTZONESV2 ioctl
Damien Le Moal
dlemoal at kernel.org
Mon Nov 3 15:01:29 PST 2025
On 11/4/25 07:12, Bart Van Assche wrote:
> On 11/3/25 7:17 AM, Johannes Thumshirn wrote:
>> On 11/3/25 2:38 PM, Damien Le Moal wrote:
>>> Introduce the new BLKREPORTZONESV2 ioctl command to allow user
>>> applications access to the fast zone report implemented by
>>> blkdev_report_zones_cached(). This new ioctl is defined as number 142
>>> and is documented in include/uapi/linux/fs.h.
>>>
>>> Unlike the existing BLKREPORTZONES ioctl, this new ioctl uses the flags
>>> field of struct blk_zone_report also as an input. If the user sets the
>>> BLK_ZONE_REP_CACHED flag as an input, then blkdev_report_zones_cached()
>>> is used to generate the zone report using cached zone information. If
>>> this flag is not set, then BLKREPORTZONESV2 behaves in the same manner
>>> as BLKREPORTZONES and the zone report is generated by accessing the
>>> zoned device.
>>
>> Is there a downside to always do the caching? A.k.a do we need the new
>> ioctl or can we keep using the old one and cache the report zones reply?
>
> Hi Damien and Johannes,
>
> I have a different proposal, namely not to introduce BLKREPORTZONEV2 at
> all. If we keep the BLKREPORTZONE ioctl and do not introduce the
> BLKREPORTZONEV2 ioctl then in the kernel we only have to cache zone
> information that will be used by filesystems. Information that won't be
> used by filesystems doesn't have to be cached. With this approach the
> existing data structures are sufficient (struct blk_zone_wplug and
> conv_zones_bitmap) and we don't need to introduce new data structures
> for tracking zone information.
See XFS and BTFS mount code.
E.g., for XFS, xfs_mount_zones() -> xfs_get_zone_info_cb() -> xfs_init_zone().
Zone type, condition and write pointer are used. That's about everything in the
zone report and to generate that we need: (1) zone condition and (2) zone write
pointer offset. Both are available from zone write plugs and when we do not have
a zone write plug, we need the zone condition (1), and that allows us to infer
(2). For the zone type, that can always be inferred from the zone condition so
that is not cached.
So we already are caching the *minimum* amount of data needed, and that data
allows us to generate a near perfect zone report without needing to interrogate
the drive. We are not doing any "Information that won't be used by filesystems
doesn't have to be cached.". This is already optimal.
BLKREPORTZONEV2 is for users to also get the benefits of a faster zone report
for things like mkfs (formatting a large RAID volume takes a long time because
of zone reports on all drives). Removing it would be counter productive.
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list