[PATCH v7 0/3] FDP and per-io hints

Christoph Hellwig hch at lst.de
Tue Oct 15 08:22:57 PDT 2024


On Tue, Oct 15, 2024 at 09:09:20AM -0600, Keith Busch wrote:
> On Tue, Oct 15, 2024 at 07:50:06AM +0200, Christoph Hellwig wrote:
> > 1) While the current per-file temperature hints interface is not perfect
> > it is okay and make sense to reuse until we need something more fancy.
> > We make good use of it in f2fs and the upcoming zoned xfs code to help
> > with data placement and have numbers to show that it helps.
> 
> So we're okay to proceed with patch 1?

No, see point 3 and 4 below for why.  We'll need something like the
interface you suggested by me in point 4 and by you in reply to point 3
in the block layer, and then block/fops.c can implement the mapping on
top of that for drivers supporting it.

>  
> > 2) A per-I/O interface to set these temperature hint conflicts badly
> > with how placement works in file systems.  If we have an urgent need
> > for it on the block device it needs to be opt-in by the file operations
> > so it can be enabled on block device, but not on file systems by
> > default.  This way you can implement it for block device, but not
> > provide it on file systems by default.  If a given file system finds
> > a way to implement it it can still opt into implementing it of course.
> 
> If we add a new fop_flag that only block fops enables, then it's okay?

The flag is just one part of it.  Of course it need to be discoverable
from userspace in one way or another, and the marshalling of the flag
needs to be controller by the file system / fops instance.

> > 3) Mapping from temperature hints to separate write streams needs to
> > happen above the block layer, because file systems need to be in
> > control of it to do intelligent placement.  That means if you want to
> > map from temperature hints to stream separation it needs to be
> > implemented at the file operation layer, not in the device driver.
> > The mapping implemented in this series is probably only useful for
> > block devices.  Maybe if dumb file systems want to adopt it, it could
> > be split into library code for reuse, but as usual that's probably
> > best done only when actually needed.
> 
> IMO, I don't even think the io_uring per-io hint needs to be limited to
> the fcntl lifetime values. It could just be a u16 value opaque to the
> block layer that just gets forwarded to the device.

Well, that's what I've been arguing for all the time, and what Kanchan's
previous series was working towards.  It's not quite as trivial as
we need a bit more than just the stream, e.g. a way to discover how many
of them exist.

> > 4) To support this the block layer, that is bios and requests need
> > to support a notion of stream separation.   Kanchan's previous series
> > had most of the bits for that, it just needs to be iterated on.
> > 
> > All of this could have probably be easily done in the time spent on
> > this discussion.
---end quoted text---



More information about the Linux-nvme mailing list