[PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices

Matias Bjørling Matias.Bjorling at wdc.com
Tue Mar 15 07:03:15 PDT 2022


> -----Original Message-----
> From: Javier González <javier at javigon.com>
> Sent: Tuesday, 15 March 2022 14.53
> To: Christoph Hellwig <hch at lst.de>
> Cc: Matias Bjørling <Matias.Bjorling at wdc.com>; Damien Le Moal
> <damien.lemoal at opensource.wdc.com>; Luis Chamberlain
> <mcgrof at kernel.org>; Keith Busch <kbusch at kernel.org>; Pankaj Raghav
> <p.raghav at samsung.com>; Adam Manzanares
> <a.manzanares at samsung.com>; jiangbo.365 at bytedance.com; kanchan Joshi
> <joshi.k at samsung.com>; Jens Axboe <axboe at kernel.dk>; Sagi Grimberg
> <sagi at grimberg.me>; Pankaj Raghav <pankydev8 at gmail.com>; Kanchan Joshi
> <joshiiitr at gmail.com>; linux-block at vger.kernel.org; linux-
> nvme at lists.infradead.org
> Subject: Re: [PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices
> 
> On 15.03.2022 14:30, Christoph Hellwig wrote:
> >On Tue, Mar 15, 2022 at 02:26:11PM +0100, Javier González wrote:
> >> but we do not see a usage for ZNS in F2FS, as it is a mobile
> >> file-system. As other interfaces arrive, this work will become natural.
> >>
> >> ZoneFS and butrfs are good targets for ZNS and these we can do. I
> >> would still do the work in phases to make sure we have enough early
> >> feedback from the community.
> >>
> >> Since this thread has been very active, I will wait some time for
> >> Christoph and others to catch up before we start sending code.
> >
> >Can someone summarize where we stand?  Between the lack of quoting from
> >hell and overly long lines from corporate mail clients I've mostly
> >stopped reading this thread because it takes too much effort actually
> >extract the information.
> 
> Let me give it a try:
> 
>   - PO2 emulation in NVMe is a no-go. Drop this.
> 
>   - The arguments against supporting PO2 are:
>       - It makes ZNS depart from a SMR assumption of PO2 zone sizes. This
>         can create confusion for users of both SMR and ZNS
> 
>       - Existing applications assume PO2 zone sizes, and probably do
>         optimizations for these. These applications, if wanting to use
>         ZNS will have to change the calculations
> 
>       - There is a fear for performance regressions.
> 
>       - It adds more work to you and other maintainers
> 
>   - The arguments in favour of PO2 are:
>       - Unmapped LBAs create holes that applications need to deal with.
>         This affects mapping and performance due to splits. Bo explained
>         this in a thread from Bytedance's perspective.  I explained in an
>         answer to Matias how we are not letting zones transition to
>         offline in order to simplify the host stack. Not sure if this is
>         something we want to bring to NVMe.
> 
>       - As ZNS adds more features and other protocols add support for
>         zoned devices we will have more use-cases for the zoned block
>         device. We will have to deal with these fragmentation at some
>         point.
> 
>       - This is used in production workloads in Linux hosts. I would
>         advocate for this not being off-tree as it will be a headache for
>         all in the future.
> 
>   - If you agree that removing PO2 is an option, we can do the following:
>       - Remove the constraint in the block layer and add ZoneFS support
>         in a first patch.
> 
>       - Add btrfs support in a later patch
> 
>       - Make changes to tools once merged
> 
> Hope I have collected all points of view in such a short format.

+ Suggestion to enable all users in the kernel to limit fragmentation and maintainer burden.
+ Possible not a big issue as users already have added the necessary support and users already must manage offline zones and avoid writing across zones. 
+ Re: Bo's email, it sounds like this only affect a single vendor which knowingly made the decision to do NPO2 zone sizes. From Bo: "(What we discussed here has a precondition that is, we cannot determine if the SSD provider could change the FW to make it PO2 or not)").  


More information about the Linux-nvme mailing list