[LSF/MM/BPF TOPIC] : Flexible Data Placement (FDP) availability for kernel space file systems
Keith Busch
kbusch at kernel.org
Fri Jan 19 12:49:37 PST 2024
On Thu, Jan 18, 2024 at 08:51:37AM +1100, Dave Chinner wrote:
> On Wed, Jan 17, 2024 at 12:58:12PM +0100, Javier González wrote:
> > On 16.01.2024 11:39, Viacheslav Dubeyko wrote:
> > > > On Jan 15, 2024, at 8:54 PM, Javier González <javier.gonz at samsung.com> wrote:
> > > > > How FDP technology can improve efficiency and reliability of
> > > > > kernel-space file system?
> > > >
> > > > This is an open problem. Our experience is that making data placement
> > > > decisions on the FS is tricky (beyond the obvious data / medatadata). If
> > > > someone has a good use-case for this, I think it is worth exploring.
> > > > F2FS is a good candidate, but I am not sure FDP is of interest for
> > > > mobile - here ZUFS seems to be the current dominant technology.
> > > >
> > >
> > > If I understand the FDP technology correctly, I can see the benefits for
> > > file systems. :)
> > >
> > > For example, SSDFS is based on segment concept and it has multiple
> > > types of segments (superblock, mapping table, segment bitmap, b-tree
> > > nodes, user data). So, at first, I can use hints to place different segment
> > > types into different reclaim units.
> >
> > Yes. This is what I meant with data / metadata. We have looked also into
> > using 1 RUH for metadata and rest make available to applications. We
> > decided to go with a simple solution to start with and complete as we
> > see users.
>
> XFS has an abstract type definition for metadata that is uses to
> prioritise cache reclaim (i.e. classifies what metadata is more
> important/hotter) and that could easily be extended to IO hints
> to indicate placement.
>
> We also have a separate journal IO path, and that is probably the
> hotest LBA region of the filesystem (circular overwrite region)
> which would stand to have it's own classification as well.
Filesystem metadata is pretty small spatially in the LBA range, but
seems to have higher overwrite frequency than other data, so this could
be a great fit for FDP. Some of my _very_ early analysis though
indicates REQ_META is too coarse to really get the benefits, so finer
grain separation through new flag or hints should help.
> We've long talked about making use of write IO hints for separating
> these things out, but requiring 10+ IO hint channels just for
> filesystem metadata to be robustly classified has been a show
> stopper. Doing nothing is almost always better than doing placement
> hinting poorly.
Yeah, a totally degenerate application could make things worse than just
not using these write hints. NVMe's FDP has a standard defined feedback
mechanism through log pages to see how well you're doing with respect to
write amplification. If we assume applications using this optimization
are acting in good faith, we should be able to tune the use cases. The
FDP abstractions seem appropriate to provide generic solutions that
don't tailor to just any one vendor.
More information about the Linux-nvme
mailing list