[PATCHv10 0/9] write hints with nvme fdp, scsi streams

Keith Busch kbusch at kernel.org
Fri Nov 8 07:51:31 PST 2024


On Fri, Nov 08, 2024 at 03:18:52PM +0100, Christoph Hellwig wrote:
> On Thu, Nov 07, 2024 at 01:36:35PM -0700, Keith Busch wrote:
> > The zone block support all looks pretty neat, but I think you're making
> > this harder than necessary to support streams. You don't need to treat
> > these like a sequential write device. The controller side does its own
> > garbage collection, so no need to duplicate the effort on the host. And
> > it looks like the host side gc potentially merges multiple streams into
> > a single gc stream, so that's probably not desirable.
> 
> We're not really duplicating much.  Writing sequential is pretty easy,
> and tracking reclaim units separately means you need another tracking
> data structure, and either that or the LBA one is always going to be
> badly fragmented if they aren't the same.

You're getting fragmentation anyway, which is why you had to implement
gc. You're just shifting who gets to deal with it from the controller to
the host. The host is further from the media, so you're starting from a
disadvantage. The host gc implementation would have to be quite a bit
better to justify the link and memory usage necessary for the copies
(...queue a copy-offload discussion? oom?).

This xfs implementation also has logic to recover from a power fail. The
device already does that if you use the LBA abstraction instead of
tracking sequential write pointers and free blocks.

I think you are underestimating the duplication of efforts going on
here.



More information about the Linux-nvme mailing list