[PATCH v4 00/39] netfs, afs, 9p: Delegate high-level I/O to netfslib

Jeff Layton jlayton at kernel.org
Thu Dec 14 06:11:53 PST 2023


On Wed, 2023-12-13 at 15:23 +0000, David Howells wrote:
> Hi Jeff, Steve, Dominique,
> 
> I have been working on my netfslib helpers to the point that I can run
> xfstests on AFS to completion (both with write-back buffering and, with a
> small patch, write-through buffering in the pagecache).  I have a patch for
> 9P, but am currently unable to test it.
> 
> The patches remove a little over 800 lines from AFS, 300 from 9P, albeit with
> around 3000 lines added to netfs.  Hopefully, I will be able to remove a bunch
> of lines from Ceph too.
> 
> I've split the CIFS patches out to a separate branch, cifs-netfs, where a
> further 2000+ lines are removed.  I can run a certain amount of xfstests on
> CIFS, though I'm running into ksmbd issues and not all the tests work
> correctly because of issues between fallocate and what the SMB protocol
> actually supports.
> 
> I've also dropped the content-crypto patches out for the moment as they're
> only usable by the ceph changes which I'm still working on.
> 
> The patch to use PG_writeback instead of PG_fscache for writing to the
> cache has also been deferred, pending 9p, afs, ceph and cifs all being
> converted.
> 
> The main aims of these patches are to get high-level I/O and knowledge of
> the pagecache out of the filesystem drivers as much as possible and to get
> rid, as much of possible, of the knowledge that pages/folios exist.
> 
> Further, I would like to see ->write_begin, ->write_end and ->launder_folio
> go away.
> 
> Features that are added by these patches to that which is already there in
> netfslib:
> 
>  (1) NFS-style (and Ceph-style) locking around DIO vs buffered I/O calls to
>      prevent these from happening at the same time.  mmap'd I/O can, of
>      necessity, happen at any time ignoring these locks.
> 
>  (2) Support for unbuffered I/O.  The data is kept in the bounce buffer and
>      the pagecache is not used.  This can be turned on with an inode flag.
> 
>  (3) Support for direct I/O.  This is basically unbuffered I/O with some
>      extra restrictions and no RMW.
> 
>  (4) Support for using a bounce buffer in an operation.  The bounce buffer
>      may be bigger than the target data/buffer, allowing for crypto
>      rounding.
> 
>  (5) ->write_begin() and ->write_end() are ignored in favour of merging all
>      of that into one function, netfs_perform_write(), thereby avoiding the
>      function pointer traversals.
> 
>  (6) Support for write-through caching in the pagecache.
>      netfs_perform_write() adds the pages is modifies to an I/O operation
>      as it goes and directly marks them writeback rather than dirty.  When
>      writing back from write-through, it limits the range written back.
>      This should allow CIFS to deal with byte-range mandatory locks
>      correctly.
> 
>  (7) O_*SYNC and RWF_*SYNC writes use write-through rather than writing to
>      the pagecache and then flushing afterwards.  An AIO O_*SYNC write will
>      notify of completion when the sub-writes all complete.
> 
>  (8) Support for write-streaming where modifed data is held in !uptodate
>      folios, with a private struct attached indicating the range that is
>      valid.
> 
>  (9) Support for write grouping, multiplexing a pointer to a group in the
>      folio private data with the write-streaming data.  The writepages
>      algorithm only writes stuff back that's in the nominated group.  This
>      is intended for use by Ceph to write is snaps in order.
> 
> (10) Skipping reads for which we know the server could only supply zeros or
>      EOF (for instance if we've done a local write that leaves a hole in
>      the file and extends the local inode size).
> 
> General notes:
> 
>  (1) The fscache module is merged into the netfslib module to avoid cyclic
>      exported symbol usage that prevents either module from being loaded.
> 
>  (2) Some helpers from fscache are reassigned to netfslib by name.
> 
>  (3) netfslib now makes use of folio->private, which means the filesystem
>      can't use it.
> 
>  (4) The filesystem provides wrappers to call the write helpers, allowing
>      it to do pre-validation, oplock/capability fetching and the passing in
>      of write group info.
> 
>  (5) I want to try flushing the data when tearing down an inode before
>      invalidating it to try and render launder_folio unnecessary.
> 
>  (6) Write-through caching will generate and dispatch write subrequests as
>      it gathers enough data to hit wsize and has whole pages that at least
>      span that size.  This needs to be a bit more flexible, allowing for a
>      filesystem such as CIFS to have a variable wsize.
> 
>  (7) The filesystem driver is just given read and write calls with an
>      iov_iter describing the data/buffer to use.  Ideally, they don't see
>      pages or folios at all.  A function, extract_iter_to_sg(), is already
>      available to decant part of an iterator into a scatterlist for crypto
>      purposes.
> 
> 
> 9P notes:
> 
>  (1) I haven't managed to test this as I haven't been able to get Ganesha
>      to work correctly with 9P.
> 
>  (2) Writes should now occur in larger-than-page-sized chunks.
> 
>  (3) It should be possible to turn on multipage folio support in 9P now.
> 
> 
> Changes
> =======
> ver #4)
>  - Slimmed down the branch:
>    - Split the cifs-related patches off to a separate branch (cifs-netfs)
>    - Deferred the content-encryption to the in-progress ceph changes.
>    - Deferred the use-PG_writeback rather than PG_fscache patch
>  - Rebased on a later linux-next with afs-rotation patches.
> 
> ver #3)
>  - Moved the fscache module into netfslib to avoid export cycles.
>  - Fixed a bunch of bugs.
>  - Got CIFS to pass as much of xfstests as possible.
>  - Added a patch to make 9P use all the helpers.
>  - Added a patch to stop using PG_fscache, but rather dirty pages on
>    reading and have writepages write to the cache.
> 
> ver #2)
>  - Folded the addition of NETFS_RREQ_NONBLOCK/BLOCKED into first patch that
>    uses them.
>  - Folded addition of rsize member into first user.
>  - Don't set rsize in ceph (yet) and set it in kafs to 256KiB.  cifs sets
>    it dynamically.
>  - Moved direct_bv next to direct_bv_count in struct netfs_io_request and
>    labelled it with a __counted_by().
>  - Passed flags into netfs_xa_store_and_mark() rather than two bools.
>  - Removed netfs_set_up_buffer() as it wasn't used.
> 
> David
> 
> Link: https://lore.kernel.org/r/20231013160423.2218093-1-dhowells@redhat.com/ # v1
> Link: https://lore.kernel.org/r/20231117211544.1740466-1-dhowells@redhat.com/ # v2
> 
> David Howells (39):
>   netfs, fscache: Move fs/fscache/* into fs/netfs/
>   netfs, fscache: Combine fscache with netfs
>   netfs, fscache: Remove ->begin_cache_operation
>   netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a
>     symlink
>   netfs: Move pinning-for-writeback from fscache to netfs
>   netfs: Add a procfile to list in-progress requests
>   netfs: Allow the netfs to make the io (sub)request alloc larger
>   netfs: Add a ->free_subrequest() op
>   afs: Don't use folio->private to record partial modification
>   netfs: Provide invalidate_folio and release_folio calls
>   netfs: Implement unbuffered/DIO vs buffered I/O locking
>   netfs: Add iov_iters to (sub)requests to describe various buffers
>   netfs: Add support for DIO buffering
>   netfs: Provide tools to create a buffer in an xarray
>   netfs: Add bounce buffering support
>   netfs: Add func to calculate pagecount/size-limited span of an
>     iterator
>   netfs: Limit subrequest by size or number of segments
>   netfs: Export netfs_put_subrequest() and some tracepoints
>   netfs: Extend the netfs_io_*request structs to handle writes
>   netfs: Add a hook to allow tell the netfs to update its i_size
>   netfs: Make netfs_put_request() handle a NULL pointer
>   netfs: Make the refcounting of netfs_begin_read() easier to use
>   netfs: Prep to use folio->private for write grouping and streaming
>     write
>   netfs: Dispatch write requests to process a writeback slice
>   netfs: Provide func to copy data to pagecache for buffered write
>   netfs: Make netfs_read_folio() handle streaming-write pages
>   netfs: Allocate multipage folios in the writepath
>   netfs: Implement support for unbuffered/DIO read
>   netfs: Implement unbuffered/DIO write support
>   netfs: Implement buffered write API
>   netfs: Allow buffered shared-writeable mmap through
>     netfs_page_mkwrite()
>   netfs: Provide netfs_file_read_iter()
>   netfs, cachefiles: Pass upper bound length to allow expansion
>   netfs: Provide a writepages implementation
>   netfs: Provide a launder_folio implementation
>   netfs: Implement a write-through caching option
>   netfs: Optimise away reads above the point at which there can be no
>     data
>   afs: Use the netfs write helpers
>   9p: Use netfslib read/write_iter
> 
>  Documentation/filesystems/netfs_library.rst   |   23 +-
>  MAINTAINERS                                   |    2 +-
>  fs/9p/vfs_addr.c                              |  352 +----
>  fs/9p/vfs_file.c                              |   89 +-
>  fs/9p/vfs_inode.c                             |    5 +-
>  fs/9p/vfs_super.c                             |   14 +-
>  fs/Kconfig                                    |    1 -
>  fs/Makefile                                   |    1 -
>  fs/afs/file.c                                 |  213 +--
>  fs/afs/inode.c                                |   26 +-
>  fs/afs/internal.h                             |   72 +-
>  fs/afs/super.c                                |    2 +-
>  fs/afs/write.c                                |  826 +----------
>  fs/cachefiles/internal.h                      |    2 +-
>  fs/cachefiles/io.c                            |   10 +-
>  fs/cachefiles/ondemand.c                      |    2 +-
>  fs/ceph/addr.c                                |   25 +-
>  fs/ceph/cache.h                               |   35 +-
>  fs/ceph/inode.c                               |    2 +-
>  fs/fs-writeback.c                             |   10 +-
>  fs/fscache/Kconfig                            |   40 -
>  fs/fscache/Makefile                           |   16 -
>  fs/fscache/internal.h                         |  277 ----
>  fs/netfs/Kconfig                              |   39 +
>  fs/netfs/Makefile                             |   22 +-
>  fs/netfs/buffered_read.c                      |  229 ++-
>  fs/netfs/buffered_write.c                     | 1247 +++++++++++++++++
>  fs/netfs/direct_read.c                        |  252 ++++
>  fs/netfs/direct_write.c                       |  170 +++
>  fs/{fscache/cache.c => netfs/fscache_cache.c} |    0
>  .../cookie.c => netfs/fscache_cookie.c}       |    0
>  fs/netfs/fscache_internal.h                   |   14 +
>  fs/{fscache/io.c => netfs/fscache_io.c}       |   42 +-
>  fs/{fscache/main.c => netfs/fscache_main.c}   |   25 +-
>  fs/{fscache/proc.c => netfs/fscache_proc.c}   |   23 +-
>  fs/{fscache/stats.c => netfs/fscache_stats.c} |    4 +-
>  .../volume.c => netfs/fscache_volume.c}       |    0
>  fs/netfs/internal.h                           |  288 ++++
>  fs/netfs/io.c                                 |  214 ++-
>  fs/netfs/iterator.c                           |   97 ++
>  fs/netfs/locking.c                            |  215 +++
>  fs/netfs/main.c                               |  110 ++
>  fs/netfs/misc.c                               |  260 ++++
>  fs/netfs/objects.c                            |   63 +-
>  fs/netfs/output.c                             |  478 +++++++
>  fs/netfs/stats.c                              |   31 +-
>  fs/nfs/Kconfig                                |    4 +-
>  fs/nfs/fscache.c                              |    7 -
>  fs/smb/client/cifsfs.c                        |    9 +-
>  fs/smb/client/file.c                          |   18 +-
>  fs/smb/client/fscache.c                       |    2 +-
>  include/linux/fs.h                            |    2 +-
>  include/linux/fscache.h                       |   45 -
>  include/linux/netfs.h                         |  176 ++-
>  include/linux/writeback.h                     |    2 +-
>  include/trace/events/afs.h                    |   31 -
>  include/trace/events/netfs.h                  |  155 +-
>  mm/filemap.c                                  |    1 +
>  58 files changed, 4197 insertions(+), 2123 deletions(-)
>  delete mode 100644 fs/fscache/Kconfig
>  delete mode 100644 fs/fscache/Makefile
>  delete mode 100644 fs/fscache/internal.h
>  create mode 100644 fs/netfs/buffered_write.c
>  create mode 100644 fs/netfs/direct_read.c
>  create mode 100644 fs/netfs/direct_write.c
>  rename fs/{fscache/cache.c => netfs/fscache_cache.c} (100%)
>  rename fs/{fscache/cookie.c => netfs/fscache_cookie.c} (100%)
>  create mode 100644 fs/netfs/fscache_internal.h
>  rename fs/{fscache/io.c => netfs/fscache_io.c} (86%)
>  rename fs/{fscache/main.c => netfs/fscache_main.c} (84%)
>  rename fs/{fscache/proc.c => netfs/fscache_proc.c} (58%)
>  rename fs/{fscache/stats.c => netfs/fscache_stats.c} (97%)
>  rename fs/{fscache/volume.c => netfs/fscache_volume.c} (100%)
>  create mode 100644 fs/netfs/locking.c
>  create mode 100644 fs/netfs/misc.c
>  create mode 100644 fs/netfs/output.c
> 

This all looks pretty great, David. Nice work! I had a few comments on a
few of them, but most are no big deal. It'd be nice to get this into
linux-next soon.

On the ones where I didn't have comments, you can add:

Reviewed-by: Jeff Layton <jlayton at kernel.org>



More information about the linux-afs mailing list