[EXT] Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams
Christoph Hellwig
hch at lst.de
Mon Nov 18 23:15:56 PST 2024
On Mon, Nov 18, 2024 at 04:37:08PM -0700, Keith Busch wrote:
> We have an API that has existed for 10+ years. You are gatekeeping that
> interface by declaring NVMe's FDP is not allowed to use it. Do I have
> that wrong? You initially blocked this because you didn't like how the
> spec committe worked. Now you've shifted to trying to pretend FDP
> devices require explicit filesystem handholding that was explicely NOT
> part of that protocol.
I'm not fucking gate keeping anything, I'm really tired of this claim
with absolutely no facts backing it up.
> > And as iterared multiple times you are doing that by bypassing the
> > file system layer in a forceful way that breaks all abstractions and
> > makes your feature unavailabe for file systems.
>
> Your filesystem layering breaks the abstraction and capabilities the
> drives are providing. You're doing more harm than good trying to game
> how the media works here.
How so?
> > I've also thrown your a nugget by first explaining and then even writing
> > protype code to show how you get what you want while using the proper
> > abstractions.
>
> Oh, the untested prototype that wasn't posted to any mailing list for
> a serious review? The one that forces FDP to subscribe to the zoned
> interface only for XFS, despite these devices being squarly in the
> "conventional" SSD catagory and absolutely NOT zone devices? Despite I
> have other users using other filesystems successfuly using the existing
> interfaces that your prototype doesn't do a thing for? Yah, thanks...
What zoned interface to FDP?
The exposed interface is to:
a) pick a write stream
b) expose the size of the reclaim unit
not done yet, but needed for good operation:
c) expose how much capacity in a reclaim unit has been written
This is about as good as it gets to map the FDP (and to a lesser extent
streams) interface to an abstract block layer API. If you have a better
suggestion to actually expose these capabilities I'm all ears.
Now _my_ preferred use of that interface is a write out of place,
map LBA regions to physical reclaim blocks file system. On the hand
hand because it actually fits the file system I'm writing, on the other
hand because industry experience has shown that this is a very good
fit to flash storage (even without any explicit placement). If you
think that's all wrong that fine, despite claims to the contrary from
you absolutely nothing in the interface forced you to do that.
You can roll the dice for your LBA allocations and write them using
a secure random number generator. The interface allows for all of that,
but I doubt your results will all that great. Not my business.
> I appreciate you put the time into getting your thoughts into actual
> code and it does look very valuable for ACTUAL ZONE block devices. But
> it seems to have missed the entire point of what this hardware feature
> does. If you're doing low level media garbage collection with FDP and
> tracking fake media write pointers, then you're doing it wrong. Please
> use Open Channel and ZNS SSDs if you want that interface and stop
> gatekeeping the EXISTING interface that has proven value in production
> software today.
Hey, feel free to come up with a better design. The whole point of a
proper block layer design is that you actually can do that!
> > But instead of a picking up on that you just whine like
> > this. Either spend a little bit of effort to actually get the interface
> > right or just shut up.
>
> Why the fuck should I make an effort to do improve your pet project that
> I don't have a customer for? They want to use the interface that was
> created 10 years ago, exactly for the reason it was created, and no one
> wants to introduce the risks of an untested and unproven major and
> invasive filesystem and block stack change in the kernel in the near
> term!
Because you apparently want an interface to FDP in the block layer. And
if you want that you need to stop bypassing the file systems as pointed
out not just by me but also at least one other file system maintainer
and the maintainer of the most used block subsystem. I've thrown you
some bones how that can be done while doing everything else you did
before (at least assuming you get the fs side past the fs maintainers),
but the only thanks for that is bullshit attacks at a personal level.
More information about the Linux-nvme
mailing list