[PATCH v7 0/3] FDP and per-io hints
Christoph Hellwig
hch at lst.de
Thu Oct 17 08:23:37 PDT 2024
On Thu, Oct 17, 2024 at 08:05:38PM +0530, Kanchan Joshi wrote:
> Seems per-I/O hints are not getting the love they deserve.
> Apart from the block device, the usecase is when all I/Os of VM (or
> container) are to be grouped together or placed differently.
But that assumes the file system could actually support it. Which
is hard when you don't assume the file system isn't simply a passthrough
entity, which will not give you great results.
> > 2) A per-I/O interface to set these temperature hint conflicts badly
> > with how placement works in file systems. If we have an urgent need
> > for it on the block device it needs to be opt-in by the file operations
> > so it can be enabled on block device, but not on file systems by
> > default. This way you can implement it for block device, but not
> > provide it on file systems by default. If a given file system finds
> > a way to implement it it can still opt into implementing it of course.
>
> Why do you see this as something that is so different across filesystems
> that they would need to "find a way to implement"?
If you want to do useful stream separation you need to write data
sequentially into the stream. Now with streams or FDP that does not
actually imply sequentially in LBA space, but if you want the file
system to not actually deal with fragmentation from hell, and be
easily track what is grouped together you really want it sequentially
in the LBA space as well. In other words, any kind of write placement
needs to be intimately tied to the file system block allocator.
> Both per-file and per-io hints are supplied by userspace. Inode and
> kiocb only happen to be the mean to receive the hint information.
> FS is free to use this information (iff it wants) or simply forward this
> down.
As mentioned above just passing it down is not actually very useful.
It might give you nice benchmark numbers when you basically reimplement
space management in userspace on a fully preallocated file, but for that
you're better of just using the block device. If you actually want
to treat the files as files you need full file system involvement.
> Per-file hint just gets stored (within inode) without individual FS
> involvement. Per-io hint follows the same model (i.e., it is set by
> upper layer like io_uring/aio) and uses kiocb to store the hint. It does
> not alter the stored inode hint value!
Yes, and now you'll get complaints that the file system ignores it
when it can't properly support it. This is why we need a per-fop
opt in.
> The generic code (like fs/direct-io.c, fs/iomap/direct-io.c etc.,)
> already forwards the incoming hints, without any intelligence.
Yes, and that is a problem. We stopped doing that, but Samsung sneaked
some of this back in recently as I noticed.
> Overall, I do not see the conflict. It's all user-driven. No?
I have the gut feeling that you've just run benchmarks on image files
emulating block devices and not actually tried real file system workloads
based on this unfortunately.
More information about the Linux-nvme
mailing list