[PATCH v3 11/11] dm-zoned: ensure only power of 2 zone sizes are allowed

David Sterba dsterba at suse.cz
Wed May 11 09:00:02 PDT 2022


On Wed, May 11, 2022 at 04:39:17PM +0200, Pankaj Raghav wrote:
> Hi David,
> 
> On 2022-05-09 20:54, David Sterba wrote:>> diff --git
> a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
> >> index 3e7b1fe15..27dc4ddf2 100644
> >> --- a/drivers/md/dm-zone.c
> >> +++ b/drivers/md/dm-zone.c
> >> @@ -231,6 +231,18 @@ static int dm_revalidate_zones(struct mapped_device *md, struct dm_table *t)
> >>  	struct request_queue *q = md->queue;
> >>  	unsigned int noio_flag;
> >>  	int ret;
> >> +	struct block_device *bdev = md->disk->part0;
> >> +	sector_t zone_sectors;
> >> +	char bname[BDEVNAME_SIZE];
> >> +
> >> +	zone_sectors = bdev_zone_sectors(bdev);
> >> +
> >> +	if (!is_power_of_2(zone_sectors)) {
> > 
> > is_power_of_2 takes 'unsigned long' and sector_t is u64, so this is not
> > 32bit clean and we had an actual bug where value 1<<48 was not
> > recognized as power of 2.
> > 
> Good catch. Now I understand why btrfs has a helper for is_power_of_two_u64.
> 
> But the zone size can never be more than 32bit value so the zone size
> sect will never greater than unsigned long.

We've set the maximum supported zone size in btrfs to be 8G, which is a
lot and should be sufficient for some time, but this also means that the
value is larger than 32bit maximum. I have actually tested btrfs on top
of such emaulated zoned device via TCMU, so it's not dm-zoned, so it's
up to you to make sure that a silent overflow won't happen.

> With that said, we have two options:
> 
> 1.) We can put a comment explaining that even though it is 32 bit
> unsafe, zone size sect can never be a 32bit value

This is probably part of the protocol and specification of the zoned
devices, the filesystem either accepts the spec or makes some room for
larger values in case it's not too costly.

> or
> 
> 2) We should move the btrfs only helper `is_power_of_two_u64` to some
> common header and use it everywhere.

Yeah, that can be done independently. With some macro magic it can be
made type-safe for any argument while preserving the 'is_power_of_2'
name.



More information about the Linux-nvme mailing list