[PATCH v3 00/11] support non power of 2 zoned devices

Pankaj Raghav p.raghav at samsung.com
Mon May 9 04:02:23 PDT 2022


On 2022-05-06 12:00, David Sterba wrote:
>>   The current approach for npo2 devices is to place the superblock mirror
>>   zones near   512GB and 4TB that is **aligned to the zone size**.
> 
> I don't like that, the offsets have been chosen so the values are fixed
> and also future proof in case the zone size increases significantly. The
> natural alignment of the pow2 zones makes it fairly trivial.
> 
> If I understand correctly what you suggest, it would mean that if zone
> is eg. 5G and starts at 510G then the superblock should start at 510G,
> right? And with another device that has 7G zone size the nearest
> multiple is 511G. And so on.
> 
> That makes it all less predictable, depending on the physical device
> constraints that are affecting the logical data structures of the
> filesystem. We tried to avoid that with pow2, the only thing that
> depends on the device is that the range from the super block offsets is
> always 2 zones.
> 
> I really want to keep the offsets for all zoned devices the same and
> adapt the code that's handling the writes. This is possible with the
> non-pow2 too, the first write is set to the expected offset, leaving the
> beginning of the zone unused.
> 
I agree. Having a known place for superblocks is important for recovery
tools. We were thinking along the lines of what you have suggested. I
will add this support in the next revision.
>>   This
>>   is of no issue for normal operation as we keep track where the superblock
>>   mirror are placed but this can cause an issue with recovery tools for
>>   zoned devices as they expect mirror superblock to be in 512GB and 4TB.
> 
> Yeah the tools need to be updated, btrfs-progs and suite of blk* in
> util-linux.
> 
>>   Note that ATM, recovery tools such as `btrfs check` does not work for
>>   image dumps for zoned devices even for po2 zone sizes.
> 
> I thought this worked, but if you find something that does not please
> report that to Johannes or Naohiro.
Ok. Thanks.



More information about the Linux-nvme mailing list