[LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems

Javier González javier.gonz at samsung.com
Wed Jan 17 23:12:25 PST 2024


On 18.01.2024 08:51, Dave Chinner wrote:
>On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
>> On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
>> > > On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz at samsung.com> wrote:
>> > > > How FDP technology can improve efficiency and reliability of
>> > > > kernel-space file system?
>> > >
>> > > This is an open problem. Our experience is that making data placement
>> > > decisions on the FS is tricky (beyond the obvious data / medatadata). If
>> > > someone has a good use-case for this, I think it is worth exploring.
>> > > F2FS is a good candidate, but I am not sure FDP is of interest for
>> > > mobile - here ZUFS seems to be the current dominant technology.
>> > >
>> >
>> > If I understand the FDP technology correctly, I can see the benefits for
>> > file systems. :)
>> >
>> > For example, SSDFS is based on segment concept and it has multiple
>> > types of segments (superblock, mapping table, segment bitmap, b-tree
>> > nodes, user data). So, at first, I can use hints to place different segment
>> > types into different reclaim units.
>>
>> Yes. This is what I meant with data / metadata. We have looked also into
>> using 1 RUH for metadata and rest make available to applications. We
>> decided to go with a simple solution to start with and complete as we
>> see users.
>
>XFS has an abstract type definition for metadata that is uses to
>prioritise cache reclaim (i.e. classifies what metadata is more
>important/hotter) and that could easily be extended to IO hints
>to indicate placement.
>
>We also have a separate journal IO path, and that is probably the
>hotest LBA region of the filesystem (circular overwrite region)
>which would stand to have it's own classification as well.
>
>We've long talked about making use of write IO hints for separating
>these things out, but requiring 10+ IO hint channels just for
>filesystem metadata to be robustly classified has been a show
>stopper. Doing nothing is almost always better than doing placement
>hinting poorly.

I fully agree with the last statement.

In my experience, if doing something, it is probably better to target 2
or 3 data streams that target what you would expect it to be the larger
metric gap (be it data hotness, size, etc).

The difficult thing is identifying these small changes that can bring a
percentage of the benefit without getting into corner cases that take
most of the effort.

>
>> > Technically speaking, any file system can place different types of metadata in
>> > different reclaim units. However, user data is slightly more tricky case. Potentially,
>> > file system logic can track “hotness” or frequency of updates of some user data
>> > and try to direct the different types of user data in different reclaim units.
>
>*cough*
>
>We already do this in the LBA space via the filesytsem allocators.
>It's often configurable and generally called "allocation policies".
>
>> > But, from another point of view, we have folders in file system namespace.
>> > If application can place different types of data in different folders, then, technically
>> > speaking, file system logic can place the content of different folders into different
>> > reclaim units. But application needs to follow some “discipline” to store different
>> > types of user data (different “hotness”, for example) in different folders.
>
>Yup, XFS does this "physical locality is determined by parent
>directory" separation by default (the inode64 allocation policy).
>Every new directory inode is placed in a different allocation group
>(LBA space) based on a rotor mechanism. All the files within that
>directory are kept local to the directory (i.e. in the same AG/LBA
>space) as much as possible.
>
>Most filesystems have LBA locality policies like this because it is
>highly efficient on physical seek latency limited storage hardware.
>i.e. the storage hardware we've mostly been using since the early
>1980s.
>
>We could make allocation groups have different reclaim units,
>but then we are talking about needing an arbitrary number of
>different IO hints - XFS supports ~2^31 AGs if the filesystem is
>large enough, and there's no way we're going to try to support that
>many IO hints (software or hardware) in the foreseeable future.
>
>IF devices want to try to classify related data themselves, then
>using LBA locality internally to classify relationships below the
>level of IO hints, then that would be a much closer match to how
>filesystems have traditionally structured the data and metadata on
>disk. Related data and metadata tends to get written to the same LBA
>regions because that's the fastest way to access related and
>metadata on seek-limited hardware.
>
>Yeah, I know that these are SSDs we are talking about and they
>aren't seek limited, but when we already have filesystem
>implementations that try to clump related things to nearby LBA
>spaces, it might be best to try to leverage this behaviour rather
>than try to rely on kernel and userspace to correctly provide hints
>about their data patterns.

+1



More information about the Linux-nvme mailing list