[PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices
Javier González
javier at javigon.com
Thu Mar 10 07:16:28 PST 2022
On 10.03.2022 07:07, Keith Busch wrote:
>On Thu, Mar 10, 2022 at 02:58:07PM +0000, Matias Bjørling wrote:
>> >> Yes, these drives are intended for Linux users that would use the
>> > >> zoned block device. Append is supported but holes in the LBA space
>> > >> (due to diff in zone cap and zone size) is still a problem for these users.
>> > >
>> > > With respect to the specific users, what does it break specifically? What are
>> > key features are they missing when there's holes?
>> >
>> > What we hear is that it breaks existing mapping in applications, where the
>> > address space is seen as contiguous; with holes it needs to account for the
>> > unmapped space. This affects performance and and CPU due to unnecessary
>> > splits. This is for both reads and writes.
>> >
>> > For more details, I guess they will have to jump in and share the parts that
>> > they consider is proper to share in the mailing list.
>> >
>> > I guess we will have more conversations around this as we push the block
>> > layer changes after this series.
>>
>> Ok, so I hear that one issue is I/O splits - If I assume that reads
>> are sequential, zone cap/size between 100MiB and 1GiB, then my gut
>> feeling would tell me its less CPU intensive to split every 100MiB to
>> 1GiB of reads, than it would be to not have power of 2 zones due to
>> the extra per io calculations.
>
>Don't you need to split anyway when spanning two zones to avoid the zone
>boundary error?
If you have size = capacity then you can do a cross-zone read. This is
only a problem when we have gaps.
>Maybe this is a silly idea, but it would be a trivial device-mapper
>to remap the gaps out of the lba range.
One thing we have considered is that as we remove the PO2 constraint
from the block layer is that devices exposing PO2 zone sizes are able to
do the emulation the other way around to support things like this.
A device mapper is also a fine place to put this, but it seems like a
very simple task. Is it worth all the boilerplate code for the device
mapper only for this?
More information about the Linux-nvme
mailing list