[PATCH v3 11/11] dm-zoned: ensure only power of 2 zone sizes are allowed
Pankaj Raghav
p.raghav at samsung.com
Thu May 12 01:27:38 PDT 2022
>>>> + zone_sectors = bdev_zone_sectors(bdev);
>>>> +
>>>> + if (!is_power_of_2(zone_sectors)) {
>>>
>>> is_power_of_2 takes 'unsigned long' and sector_t is u64, so this is not
>>> 32bit clean and we had an actual bug where value 1<<48 was not
>>> recognized as power of 2.
>>>
>> Good catch. Now I understand why btrfs has a helper for is_power_of_two_u64.
>>
>> But the zone size can never be more than 32bit value so the zone size
>> sect will never greater than unsigned long.
>
> We've set the maximum supported zone size in btrfs to be 8G, which is a
> lot and should be sufficient for some time, but this also means that the
> value is larger than 32bit maximum. I have actually tested btrfs on top
> of such emaulated zoned device via TCMU, so it's not dm-zoned, so it's
> up to you to make sure that a silent overflow won't happen.
>
bdev_zone_sectors is used in this case and not the actual size in bytes.
So the zone size need to be 2TB for the sectors value to cross the 32bit
limit. This is likely not an issue in the near future.
>> With that said, we have two options:
>>
>> 1.) We can put a comment explaining that even though it is 32 bit
>> unsafe, zone size sect can never be a 32bit value
>
> This is probably part of the protocol and specification of the zoned
> devices, the filesystem either accepts the spec or makes some room for
> larger values in case it's not too costly.
>
>> or
>>
>> 2) We should move the btrfs only helper `is_power_of_two_u64` to some
>> common header and use it everywhere.
>
> Yeah, that can be done independently. With some macro magic it can be
> made type-safe for any argument while preserving the 'is_power_of_2'
> name.
But I agree with your point that we need a type safe power of 2
implementation in a common header so that we can avoid silent overflows
in 32 bit architectures.
I will keep the change as is in this patch and follow up on the type
safe power of 2 later independently. Thanks.
More information about the Linux-nvme
mailing list