[PATCHv11 0/9] write hints with nvme fdp and scsi streams

Christoph Hellwig hch at lst.de
Tue Nov 12 08:50:54 PST 2024


On Tue, Nov 12, 2024 at 07:25:45AM -0700, Keith Busch wrote:
> > I feel like banging my head against the wall.  No, passing through write
> > streams is simply not acceptable without the file system being in
> > control.  I've said and explained this in detail about a dozend times
> > and the file system actually needing to do data separation for it's own
> > purpose doesn't go away by ignoring it.
> 
> But that's just an ideological decision that doesn't jive with how
> people use these.

Sorry, but no it is not.  The file system is the entity that owns the
block device, and it is the layer that manages the block device.
Bypassing it is an layering violation that creates a lot of problems
and solves none at all.

> The applications know how they use their data better
> than the filesystem,

That is a very bold assumption, and a clear indication that you are
actually approaching this with a rather idiological hat.  If your
specific application actually thinks it knows the storage better than
the file system that you are using you probably should not be using
that file system.  Use a raw block device or even better passthrough
or spdk if you really know what you are doing (or at least thing so).

Otherwise you need to agree that the file system is the final arbiter
of the underlying device resource.  Hint: if you have an application
that knows that it is doing (there actually are a few of those) it's
usually not hard to actually work with file system people to create
abstractions that don't poke holes into layering but still give the
applications what you want.  There's also the third option of doing
something like what Damien did with zonefs and actually create an
abstraction for what what your are doing.

> so putting the filesystem in the way to force
> streams look like zones is just a unnecessary layer of indirection
> getting in the way.

Can you please stop this BS?  Even if a file system doesn't treat
write streams like zones keeps LBA space and physical allocation units
entirely separate (for which I see no good reason, but others might
disagree) you still need the file system in control of the hardware
resources.




More information about the Linux-nvme mailing list