[PATCH v3 00/11] support non power of 2 zoned devices

David Sterba dsterba at suse.cz
Fri May 6 03:00:55 PDT 2022


On Fri, May 06, 2022 at 10:10:54AM +0200, Pankaj Raghav wrote:
> - Open issue:
> * btrfs superblock location for zoned devices is expected to be in 0,
>   512GB(mirror) and 4TB(mirror) in the device. Zoned devices with po2
>   zone size will naturally align with these superblock location but non
>   po2 devices will not align with 512GB and 4TB offset.
> 
>   The current approach for npo2 devices is to place the superblock mirror
>   zones near   512GB and 4TB that is **aligned to the zone size**.

I don't like that, the offsets have been chosen so the values are fixed
and also future proof in case the zone size increases significantly. The
natural alignment of the pow2 zones makes it fairly trivial.

If I understand correctly what you suggest, it would mean that if zone
is eg. 5G and starts at 510G then the superblock should start at 510G,
right? And with another device that has 7G zone size the nearest
multiple is 511G. And so on.

That makes it all less predictable, depending on the physical device
constraints that are affecting the logical data structures of the
filesystem. We tried to avoid that with pow2, the only thing that
depends on the device is that the range from the super block offsets is
always 2 zones.

I really want to keep the offsets for all zoned devices the same and
adapt the code that's handling the writes. This is possible with the
non-pow2 too, the first write is set to the expected offset, leaving the
beginning of the zone unused.

>   This
>   is of no issue for normal operation as we keep track where the superblock
>   mirror are placed but this can cause an issue with recovery tools for
>   zoned devices as they expect mirror superblock to be in 512GB and 4TB.

Yeah the tools need to be updated, btrfs-progs and suite of blk* in
util-linux.

>   Note that ATM, recovery tools such as `btrfs check` does not work for
>   image dumps for zoned devices even for po2 zone sizes.

I thought this worked, but if you find something that does not please
report that to Johannes or Naohiro.



More information about the Linux-nvme mailing list