[Report] blk-zoned/ZNS: non_power_of_2 of zone->len]

Damien Le Moal dlemoal at kernel.org
Thu Jan 11 19:05:45 PST 2024


On 1/12/24 10:13, Ming Lei wrote:
> Hello Damien and Guys,
> 
> Yi reported that the following failure:
> 
> Oct 18 15:24:15 localhost kernel: nvme nvme4: invalid zone size:196608 for namespace:1
> Oct 18 15:24:33 localhost smartd[2303]: Device: /dev/nvme4, opened
> Oct 18 15:24:33 localhost smartd[2303]: Device: /dev/nvme4, NETAPPX4022S173A4T0NTZ, S/N:S66NNE0T800169, FW:MVP40B7B, 4.09 TB
> 
> Looks current blk-zoned requires zone->len to be power_of_2() since
> commit:
> 
> 6c6b35491422 ("block: set the zone size in blk_revalidate_disk_zones atomically")
> 
> And the original power_of_2() requirement is from the following commit
> for ZBC and ZAC.
> 
> d9dd73087a8b ("block: Enhance blk_revalidate_disk_zones()")
> 
> Meantime block layer does support non-power_of_2 chunk sectors limit.

That is not true. It does. See blk_stack_limits which ahs:

	/* Set non-power-of-2 compatible chunk_sectors boundary */
        if (b->chunk_sectors)
                t->chunk_sectors = gcd(t->chunk_sectors, b->chunk_sectors);

and the absence of any check on the value of chunk_sectors in
blk_queue_chunk_sectors().

> The question is if there is such hard requirement for ZNS, and I can't see
> any such words in NVMe Zoned Namespace Command Set Specification.

No, there are no requirements in ZNS for the zone size to be a power of 2 number
of sectors/LBAs. The same is also true for ZBC and ZAC (SCSI and ATA) SMR HDDs.
The requirement for the zone size to be a power of 2 number of sectors is
entirely in the kernel. The reason being that zoned block device support started
with SMR HDDs which all had a zone size of 256 MB (and still do) and no user
ever wanted anything else than that. So everything was coded with this
requirement, as that allowed many nice things like bit-shift/mask arithmetic for
conversions between zone number and sectors etc (and that of course is very
efficient).

> So is it one NVMe firmware issue? or blk-zoned problem with too strict(power_of_2)
> requirement on zone->len?

It is the latter. There was a session at LSF/MM last year about this. I recall
that the conclusion was that unless there is a strong user demand for non power
of 2 zone size, we are not going to do anything about it. Because allowing
non-power of 2 zone size has some serious consequences all over the place,
including in FSes that natively support zoned devices. So relaxing that
requirement is not trivial.


-- 
Damien Le Moal
Western Digital Research




More information about the Linux-nvme mailing list