[PATCH v7 0/3] FDP and per-io hints

Javier Gonzalez javier.gonz at samsung.com
Thu Oct 10 05:22:32 PDT 2024


On 10.10.2024 11:20, Christoph Hellwig wrote:
>On Thu, Oct 10, 2024 at 09:13:27AM +0200, Javier Gonzalez wrote:
>> Is this because RocksDB already does seggregation per file itself? Are
>> you doing something specific on XFS or using your knoledge on RocksDB to
>> map files with an "unwritten" protocol in the midde?
>
>XFS doesn't really do anything smart at all except for grouping files
>with similar temperatures, but Hans can probably explain it in more
>detail.  So yes, this relies on the application doing the data separation
>and using the most logical vehicle for it: files.

This makes sense. Agree.

>
>>
>>    In this context, we have collected data both using FDP natively in
>>    RocksDB and using the temperatures. Both look very good, because both
>>    are initiated by RocksDB, and the FS just passes the hints directly
>>    to the driver.
>>
>> I ask this to understand if this is the FS responsibility or the
>> application's one. Our work points more to letting applications use the
>> hints (as the use-cases are power users, like RocksDB). I agree with you
>> that a FS could potentially make an improvement for legacy applications
>> - we have not focused much on these though, so I trust you insights on
>> it.
>
>As mentioned multiple times before in this thread this absolutely
>depends on the abstraction level of the application.  If the application
>works on a raw device without a file system it obviously needs very
>low-level control.  And in my opinion passthrough is by far the best
>interface for that level of control. 

Passthru is great for prototyping and getting insights on end-to-end
applicability. We see though that it is difficult to get a full solution
based on it, unless people implement a use-space layer tailored to their
use-case (e.g., a version SPDK's bdev). After the POC phase, most folks
that can use passthru prefer to move to block - with a validated
use-case it should be easier to get things upstream.

This is exactly where we are now.

>If the application is using a
>file system there is no better basic level abstraction than a file,
>which can then be enhanced with relatively small amount of additional
>information going both ways: the file system telling the application
>what good file sizes and write patterns are, and the application telling
>the file system what files are good candidates to merge into the same
>write stream if the file system has to merge multiple actively written
>to files into a write stream.  Trying to do low-level per I/O hints
>on top of a file system is a recipe for trouble because you now have
>to entities fighting over placement control.

For file, I agree with you.

If you saw the comments from Christian on the inode space, there are a
few plumbing challenges. Do you have any patches we could look at?






More information about the Linux-nvme mailing list