[PATCH v7 0/3] FDP and per-io hints
Christoph Hellwig
hch at lst.de
Thu Oct 3 22:31:21 PDT 2024
On Thu, Oct 03, 2024 at 04:14:57PM -0600, Jens Axboe wrote:
> On 10/3/24 6:54 AM, Christoph Hellwig wrote:
> > For file: yes. The problem is when you have more files than buckets on
> > the device or file systems. Typical enterprise SSDs support somewhere
> > between 8 and 16 write streams, and there typically is more data than
> > that. So trying to group it somehow is good idea as not all files can
> > have their own bucket.
> >
> > Allowing this inside a file like done in this patch set on the other
> > hand is pretty crazy.
>
> I do agree that per-file hints are not ideal. In the spirit of making
> some progress, how about we just retain per-io hints initially? We can
> certainly make that work over dio. Yes buffered IO won't work initially,
> but at least we're getting somewhere.
Huh? Per I/O hints at the syscall level are the problem (see also the
reply from Martin). Per file make total sense, but we need the file
system in control.
The real problem is further down the stack. For the SCSI temperature
hints just passing them on make sense. But when you map to some kind
of stream separation in the device, no matter if that is streams, FDP,
or various kinds of streams we don't even support in thing like CF
and SDcard, the driver is not the right place to map temperature hint
to streams. The requires some kind of intelligence. It could be
dirt simple and just do a best effort mapping of the temperature
hints 1:1 to separate write streams, or do a little mapping if there
is not enough of them which should work fine for a raw block device.
But one we have a file system things get more complicated:
- the file system will want it's own streams for metadata and GC
- even with that on beefy enough hardware you can have more streams
then temperature levels, and the file system can and should
do intelligen placement (based usually on files)
Or to summarize: the per-file temperature hints make sense as a user
interface. Per-I/O hints tend to be really messy at least if a file
system is involved. Placing the temperatures to separate write streams
in the driver does not scale even to the most trivial write stream
aware file system implementations.
And for anyone who followed the previous discussions of the patches
none of this should been new, each point has been made at least three
times before.
More information about the Linux-nvme
mailing list