[PATCH 0/6] power_of_2 emulation support for NVMe ZNS devices

Matias Bjørling Matias.Bjorling at wdc.com
Tue Mar 15 03:45:43 PDT 2022


> > > On Mon, Mar 14, 2022 at 02:16:36PM +0000, Matias Bjørling wrote:
> > > > I want to turn the argument around to see it from the kernel
> > > > developer's point of view. They have communicated the PO2
> > > > requirement clearly,
> > >
> > > Such requirement is based on history and effort put in place to
> > > assume a PO2 requirement for zone storage, and clearly it is not.
> > > And clearly even vendors who have embraced PO2 don't know for sure
> > > they'll always be able to stick to PO2...
> >
> > Sure - It'll be naïve to give a carte blanche promise.
> 
> Exactly. So taking a position to not support NPO2 I think seems counter
> productive to the future of ZNS, the question whould be, *how* to best do this
> in light of what we need to support / avoid performance regressions / strive
> towards avoiding fragmentation.

Having non-power of two zone sizes is a derivation from existing devices being used in full production today. That there is a wish to introduce support for such drives is interesting, but given the background and development of zoned devices. Damien mentioned that SMR HDDs didn't start off with PO2 zone sizes - that was what became the norm due to its overall benefits. I.e., drives with NPO2 zone sizes is the odd one, and in some views, is the one creating fragmentation.

That there is a wish to revisit that design decision is fair, and it sounds like there is willingness to explorer such options. But please be advised that the Linux community have had communicated the specific requirement for a long time to avoid this particular issue. Thus, the community have been trying to help the vendors make the appropriate design decisions, such that they could take advantage of the Linux kernel stack from day one.

> > However, you're skipping the next two elements, which state that there
> > are both good precedence working with PO2 zone sizes and that
> > holes/unmapped LBAs can't be avoided.
> 
> I'm not, but I admit that it's a good point of having the possibility of zones being
> taken offline also implicates holes. I also think it was a good excercise to
> discuss and evaluate emulation given I don't think this point you made would
> have been made clear otherwise. This is why I treat ZNS as evolving effort, and
> I can't seriously take any position stating all answers are known.

That's good to hear. I would note that some members in this thread have been doing zoned storage for close to a decade, and have a very thorough understanding of the zoned storage model - so it might be a stretch for them to hear that you're considering everything up in the air and early. This stack is already being used by a large percentage of the bits being shipped in the world. Thus, there is an interest in maintaining these things, and making sure that things don't regress and so on. 

> 
> > Making an argument for why NPO2
> > zone sizes may not bring what one is looking for. It's a lot of work
> > for little practical change, if any.
> 
> NAND does not incur a PO2 requirement, that should be enough to implicate
> that PO2 zones *can* be expected. If no vendor wants to take a position that
> they know for a fact they'll never adopt
> PO2 zones should be enough to keep an open mind to consider *how* to
> support them.

As long as it doesn't also imply that support *has* to be added to the kernel, then that's okay.

<snip>
> 
> > If evaluating different approaches, it would be helpful to the
> > reviewers if interfaces and all of its kernel users are converted in a
> > single patchset. This would also help to avoid users getting hit by
> > what is supported, and what isn't supported by a particular device
> > implementation and allow better to review the full set of changes
> > required to add the support.
> 
> Sorry I didn't understand the suggestion here, can you clarify what it is you are
> suggesting?

It would help reviewers that a potential patchset would convert all users (e.g., f2fs, btrfs, device mappers, io schedulers, etc.), such that the full effect can be evaluated with the added benefits that end-users not having to think about what is and what isn't supported.


More information about the Linux-nvme mailing list