[PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices
Matias Bjørling
Matias.Bjorling at wdc.com
Tue Mar 15 07:03:15 PDT 2022
> -----Original Message-----
> From: Javier González <javier at javigon.com>
> Sent: Tuesday, 15 March 2022 14.53
> To: Christoph Hellwig <hch at lst.de>
> Cc: Matias Bjørling <Matias.Bjorling at wdc.com>; Damien Le Moal
> <damien.lemoal at opensource.wdc.com>; Luis Chamberlain
> <mcgrof at kernel.org>; Keith Busch <kbusch at kernel.org>; Pankaj Raghav
> <p.raghav at samsung.com>; Adam Manzanares
> <a.manzanares at samsung.com>; jiangbo.365 at bytedance.com; kanchan Joshi
> <joshi.k at samsung.com>; Jens Axboe <axboe at kernel.dk>; Sagi Grimberg
> <sagi at grimberg.me>; Pankaj Raghav <pankydev8 at gmail.com>; Kanchan Joshi
> <joshiiitr at gmail.com>; linux-block at vger.kernel.org; linux-
> nvme at lists.infradead.org
> Subject: Re: [PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices
>
> On 15.03.2022 14:30, Christoph Hellwig wrote:
> >On Tue, Mar 15, 2022 at 02:26:11PM +0100, Javier González wrote:
> >> but we do not see a usage for ZNS in F2FS, as it is a mobile
> >> file-system. As other interfaces arrive, this work will become natural.
> >>
> >> ZoneFS and butrfs are good targets for ZNS and these we can do. I
> >> would still do the work in phases to make sure we have enough early
> >> feedback from the community.
> >>
> >> Since this thread has been very active, I will wait some time for
> >> Christoph and others to catch up before we start sending code.
> >
> >Can someone summarize where we stand? Between the lack of quoting from
> >hell and overly long lines from corporate mail clients I've mostly
> >stopped reading this thread because it takes too much effort actually
> >extract the information.
>
> Let me give it a try:
>
> - PO2 emulation in NVMe is a no-go. Drop this.
>
> - The arguments against supporting PO2 are:
> - It makes ZNS depart from a SMR assumption of PO2 zone sizes. This
> can create confusion for users of both SMR and ZNS
>
> - Existing applications assume PO2 zone sizes, and probably do
> optimizations for these. These applications, if wanting to use
> ZNS will have to change the calculations
>
> - There is a fear for performance regressions.
>
> - It adds more work to you and other maintainers
>
> - The arguments in favour of PO2 are:
> - Unmapped LBAs create holes that applications need to deal with.
> This affects mapping and performance due to splits. Bo explained
> this in a thread from Bytedance's perspective. I explained in an
> answer to Matias how we are not letting zones transition to
> offline in order to simplify the host stack. Not sure if this is
> something we want to bring to NVMe.
>
> - As ZNS adds more features and other protocols add support for
> zoned devices we will have more use-cases for the zoned block
> device. We will have to deal with these fragmentation at some
> point.
>
> - This is used in production workloads in Linux hosts. I would
> advocate for this not being off-tree as it will be a headache for
> all in the future.
>
> - If you agree that removing PO2 is an option, we can do the following:
> - Remove the constraint in the block layer and add ZoneFS support
> in a first patch.
>
> - Add btrfs support in a later patch
>
> - Make changes to tools once merged
>
> Hope I have collected all points of view in such a short format.
+ Suggestion to enable all users in the kernel to limit fragmentation and maintainer burden.
+ Possible not a big issue as users already have added the necessary support and users already must manage offline zones and avoid writing across zones.
+ Re: Bo's email, it sounds like this only affect a single vendor which knowingly made the decision to do NPO2 zone sizes. From Bo: "(What we discussed here has a precondition that is, we cannot determine if the SSD provider could change the FW to make it PO2 or not)").
More information about the Linux-nvme
mailing list