[EXT] Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams

Pierre Labat plabat at micron.com
Tue Nov 12 10:18:21 PST 2024


My 2 cents.

Overall, it seems to me that the difficulty here comes from 2 things:
1)  The write hints may have different semantics (temperature, FDP placement, and whatever will come next).
2) Different software layers may want to use the hints, and if several do that at the same time on the same storage that may result in a mess.

About 1)
Seems to me that having a different interface for each semantic is an overkill, extra code to maintain.  And extra work when a new semantic comes along.
To keep things simple, keep one set of interfaces (per IO interface, per file interface) for all write hints semantics, and carry the difference in semantic in the hint itself.
For example, with 32 bits hints, store the semantic in 8 bits and the use the rest in the context of that semantic.
The storage transport driver (nvme driver for ex), based on the 8 bits semantic in the write hint, translates adequately the write hint for the storage device.
The storage driver can support several translations, one for each semantics supported. Linux doesn't need to yank out a translation to replace it with a another/new one.

About 2)
Provide a simple way to the user to decide which layer generate write hints.
As an example, as some of you pointed out, what if the filesystem wants to generate write hints to optimize its [own] data handling by the storage, and at the same time the application using the FS understand the storage and also wants to optimize using write hints.
Both use cases are legit, I think.
To handle that in a simple way, why not have a filesystem mount parameter enabling/disabling the use of write hints by the FS?
In the case of an application not needing/wanting to use write hints on its own, the user would mount the filesystem enabling generation of write hints. That could be the default.
On the contrary if the user decides it is best for one application to directly generate write hints to get the best performance, then mount the filesystem disabling the generation of write hints by the FS. The FS act as a passthrough regarding write hints.

Regards,

Pierre
> -----Original Message-----
> From: Keith Busch <kbusch at kernel.org>
> Sent: Tuesday, November 12, 2024 6:26 AM
> To: Christoph Hellwig <hch at lst.de>
> Cc: Kanchan Joshi <joshi.k at samsung.com>; Keith Busch
> <kbusch at meta.com>; linux-block at vger.kernel.org; linux-
> nvme at lists.infradead.org; linux-scsi at vger.kernel.org; linux-
> fsdevel at vger.kernel.org; io-uring at vger.kernel.org; axboe at kernel.dk;
> martin.petersen at oracle.com; asml.silence at gmail.com;
> javier.gonz at samsung.com
> Subject: [EXT] Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams
> 
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you
> recognize the sender and were expecting this message.
> 
> 
> On Tue, Nov 12, 2024 at 02:34:39PM +0100, Christoph Hellwig wrote:
> > On Tue, Nov 12, 2024 at 06:56:25PM +0530, Kanchan Joshi wrote:
> > > IMO, passthrough propagation of hints/streams should continue to
> > > remain the default behavior as it applies on multiple filesystems.
> > > And more active placement by FS should rather be enabled by some opt
> > > in (e.g., mount option). Such opt in will anyway be needed for other
> > > reasons (like regression avoidance on a broken device).
> >
> > I feel like banging my head against the wall.  No, passing through
> > write streams is simply not acceptable without the file system being
> > in control.  I've said and explained this in detail about a dozend
> > times and the file system actually needing to do data separation for
> > it's own purpose doesn't go away by ignoring it.
> 
> But that's just an ideological decision that doesn't jive with how people use
> these. The applications know how they use their data better than the
> filesystem, so putting the filesystem in the way to force streams look like zones
> is just a unnecessary layer of indirection getting in the way.




More information about the Linux-nvme mailing list