[EXT] Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams
Christoph Hellwig
hch at lst.de
Tue Nov 12 20:47:36 PST 2024
On Tue, Nov 12, 2024 at 06:18:21PM +0000, Pierre Labat wrote:
> Overall, it seems to me that the difficulty here comes from 2 things:
> 1) The write hints may have different semantics (temperature, FDP placement, and whatever will come next).
Or rather trying to claim all these are "write hints" is simply the wrong
approach :)
> 2) Different software layers may want to use the hints, and if several do that at the same time on the same storage that may result in a mess.
That's a very nice but almost to harmless way to phrase it.
> About 1)
> Seems to me that having a different interface for each semantic is an overkill, extra code to maintain. And extra work when a new semantic comes along.
> To keep things simple, keep one set of interfaces (per IO interface, per file interface) for all write hints semantics, and carry the difference in semantic in the hint itself.
> For example, with 32 bits hints, store the semantic in 8 bits and the use the rest in the context of that semantic.
This is very similar to what the never followed up upon Kanchan did.
I think this is a lot better than blindly overloading a generic
"write hint", but still suffers from problems:
a) the code is a lot more complex and harder to maintain than just two
different values
b) it still keeps the idea that a simple temperature hint and write
stream or placement identifiers are someting comparable, which they
really aren't.
> About 2)
> Provide a simple way to the user to decide which layer generate write hints.
> As an example, as some of you pointed out, what if the filesystem wants to generate write hints to optimize its [own] data handling by the storage, and at the same time the application using the FS understand the storage and also wants to optimize using write hints.
> Both use cases are legit, I think.
> To handle that in a simple way, why not have a filesystem mount parameter enabling/disabling the use of write hints by the FS?
The file system is, and always has been, the entity in charge of
resource allocation of the underlying device. Bypassing it will get
you in trouble, and a simple mount option isn't really changing that
(it's also not exactly a scalable interface).
If an application wants to micro-manage placement decisions it should not
use a file system, or at least not a normal one with Posix semantics.
That being said we'd demonstrated that applications using proper grouping
of data by file and the simple temperature hints can get very good result
from file systems that can interpret them, without a lot of work in the
file system. I suspect for most applications that actually want files
that is actually going to give better results than trying to do the
micro-management that tries to bypass the file system.
I'm not sure if Keith was just ranting last night, but IFF the assumption
here is that file systems are just used as dumb containers and applications
manage device level placement inside them we have a much deeper problem
than just interface semantics.
More information about the Linux-nvme
mailing list