[EXT] Re: [PATCHv11 0/9] write hints with nvme fdp and scsi streams
Keith Busch
kbusch at kernel.org
Wed Nov 20 10:11:12 PST 2024
On Wed, Nov 20, 2024 at 09:21:58AM -0800, Darrick J. Wong wrote:
>
> How do filesystem users pick a write stream? I get a pretty strong
> sense that you're aiming to provide the ability for application software
> to group together a bunch of (potentially arbitrary) files in a cohort.
> Then (maybe?) you can say "This cohort of files are all expected to have
> data blocks related to each other in some fashion, so put them together
> so that the storage doesn't have to work so hard".
>
> Part of my comprehension problem here (and probably why few fs people
> commented on this thread) is that I have no idea what FDP is, or what
> the write lifetime hints in scsi were/are, or what the current "hinting"
> scheme is.
FDP is just the "new" version of NVMe's streams. Support for its
predecessor was added in commit f5d118406247acf ("nvme: add support for
streams")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f5d118406247acfc4fc481e441e01ea4d6318fdc
Various applications were written to that interface and showed initial
promise, but production quality hardware never materialized. Some of
these applications are still setting the write hints today, and the
filesystems are all passing through the block stack, but there's just
currently no nvme driver listening on the other side.
Contrast to the older nvme streams, capable hardware subscribing to this
newer FDP scheme have been developed, and so people want to use those
same applications using those same hints in the exact same way that it
was originally designed. Enabling them could be just be a simple driver
patch like the above without bothering the filesystem people :)
> Is this what we're arguing about?
>
> enum rw_hint {
> WRITE_LIFE_NOT_SET = RWH_WRITE_LIFE_NOT_SET,
> WRITE_LIFE_NONE = RWH_WRITE_LIFE_NONE,
> WRITE_LIFE_SHORT = RWH_WRITE_LIFE_SHORT,
> WRITE_LIFE_MEDIUM = RWH_WRITE_LIFE_MEDIUM,
> WRITE_LIFE_LONG = RWH_WRITE_LIFE_LONG,
> WRITE_LIFE_EXTREME = RWH_WRITE_LIFE_EXTREME,
> } __packed;
>
> (What happens if you have two disjoint sets of files, both of which are
> MEDIUM, but they shouldn't be intertwined?)
It's not going to perform as well. You'd be advised against over
subscribing the hint value among applications with different relative
expectations, but it generally (but not always) should be no worse than
if you hadn't given any hints at all.
> Or are these new fdp hint things an overload of the existing write hint
> fields in the iocb/inode/bio? With a totally different meaning from
> anticipated lifetime of the data blocks?
The meaning assigned to an FDP stream is whatever the user wants it to
mean. It's not strictly a lifetime hint, but that is certainly a valid
way to use them. The contract on the device's side is that writes to
one stream won't create media interfere or contention with writes to
other streams. This is the same as nvme's original streams, which for
some reason did not carry any of this controversy.
More information about the Linux-nvme
mailing list