[PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices

Damien Le Moal damien.lemoal at opensource.wdc.com
Thu Mar 10 15:44:33 PST 2022


On 3/11/22 00:16, Javier González wrote:
> On 10.03.2022 07:07, Keith Busch wrote:
>> On Thu, Mar 10, 2022 at 02:58:07PM +0000, Matias Bjørling wrote:
>>>  >> Yes, these drives are intended for Linux users that would use the
>>>>>> zoned block device. Append is supported but holes in the LBA space
>>>>>> (due to diff in zone cap and zone size) is still a problem for these users.
>>>>>
>>>>> With respect to the specific users, what does it break specifically? What are
>>>> key features are they missing when there's holes?
>>>>
>>>> What we hear is that it breaks existing mapping in applications, where the
>>>> address space is seen as contiguous; with holes it needs to account for the
>>>> unmapped space. This affects performance and and CPU due to unnecessary
>>>> splits. This is for both reads and writes.
>>>>
>>>> For more details, I guess they will have to jump in and share the parts that
>>>> they consider is proper to share in the mailing list.
>>>>
>>>> I guess we will have more conversations around this as we push the block
>>>> layer changes after this series.
>>>
>>> Ok, so I hear that one issue is I/O splits - If I assume that reads
>>> are sequential, zone cap/size between 100MiB and 1GiB, then my gut
>>> feeling would tell me its less CPU intensive to split every 100MiB to
>>> 1GiB of reads, than it would be to not have power of 2 zones due to
>>> the extra per io calculations.
>>
>> Don't you need to split anyway when spanning two zones to avoid the zone
>> boundary error?
> 
> If you have size = capacity then you can do a cross-zone read. This is
> only a problem when we have gaps.
> 
>> Maybe this is a silly idea, but it would be a trivial device-mapper
>> to remap the gaps out of the lba range.
> 
> One thing we have considered is that as we remove the PO2 constraint
> from the block layer is that devices exposing PO2 zone sizes are able to
> do the emulation the other way around to support things like this.
> 
> A device mapper is also a fine place to put this, but it seems like a
> very simple task. Is it worth all the boilerplate code for the device
> mapper only for this?

Boiler plate ? DM already support zoned devices. Writing a "dm-unhole"
target would be extremely simple as it would essentially be a variation
of dm-linear. There should be no DM core changes needed.

-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list