support block layer write streams and FDP

Christoph Hellwig hch at lst.de
Tue Nov 19 04:16:14 PST 2024


Hi all,

as a small interruptions to regularly scheduled culture wars this series
implements a properly layered approach to block layer write streams.

This is based on Keith "Subject: [PATCHv11 0/9] write hints with nvme fdp
and scsi streams", but doesn't bypass the file systems.

The rough idea is that block devices can expose a number of distinct
write streams, and bio submitter can pick on them.  All bios that do
not pick an explicit write stream get the default one.  On the driver
layer this is wird up to NVMe FDP, but it should also work for SCSI
and NVMe streams if someone cares enough.  On the upper layer the only
consuder right now are the block device node file operations, which
either support an explicit stream selection through io_uring, or
by mapping the old per-inode life time hints to streams.

The stream API is designed to also implementable by other files,
so a statx extension to expose the number of handles, and their
granularity is added as well.

This currently does not do the write hint mapping for file systems,
which needs to be done in the file system and under careful consideration
about how many of these streams the file system wants to grant to
the application - if any.  It also doesn't support querying how much
has been written to a "granularity unit" aka reclaim unit in NVMe,
which is essential if you want a WAF=1 but apparently not needed for
the current urgent users.

The last patch to support write streams on partitions works, but feels
like a not very nice interface to me, and might allow only to restricted
mappings for some.  It would be great if users that absolutely require
partition support to speak up and help improve it, otherwise I'd suggest
to skip it for the initial submission.

The series is based on Jens' for-next branch as of today, and also
available as git tree:

    git://git.infradead.org/users/hch/misc.git block-write-streams

Gitweb:

    http://git.infradead.org/?p=users/hch/misc.git;a=shortlog;h=refs/heads/block-write-streams

Diffstat:
 Documentation/ABI/stable/sysfs-block |   15 +++
 block/bdev.c                         |   15 +++
 block/bio.c                          |    2 
 block/blk-core.c                     |    2 
 block/blk-crypto-fallback.c          |    1 
 block/blk-merge.c                    |   39 ++-------
 block/blk-sysfs.c                    |    6 +
 block/bounce.c                       |    1 
 block/fops.c                         |   23 +++++
 block/genhd.c                        |   52 ++++++++++++
 block/partitions/core.c              |    6 -
 drivers/nvme/host/core.c             |  151 ++++++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h             |   10 +-
 fs/stat.c                            |    2 
 include/linux/blk_types.h            |    8 +
 include/linux/blkdev.h               |   16 +++
 include/linux/fs.h                   |    1 
 include/linux/nvme.h                 |   77 +++++++++++++++++
 include/linux/stat.h                 |    2 
 include/uapi/linux/io_uring.h        |    4 
 include/uapi/linux/stat.h            |    7 +
 io_uring/io_uring.c                  |    2 
 io_uring/rw.c                        |    2 
 23 files changed, 405 insertions(+), 39 deletions(-)



More information about the Linux-nvme mailing list