[PATCHv10 0/9] write hints with nvme fdp, scsi streams
Javier Gonzalez
javier.gonz at samsung.com
Fri Nov 8 09:43:44 PST 2024
> -----Original Message-----
> From: Matthew Wilcox <willy at infradead.org>
> Sent: Friday, November 8, 2024 5:55 PM
> To: Keith Busch <kbusch at kernel.org>
> Cc: Christoph Hellwig <hch at lst.de>; Keith Busch <kbusch at meta.com>; linux-
> block at vger.kernel.org; linux-nvme at lists.infradead.org; linux-scsi at vger.kernel.org;
> io-uring at vger.kernel.org; linux-fsdevel at vger.kernel.org; joshi.k at samsung.com;
> Javier Gonzalez <javier.gonz at samsung.com>; bvanassche at acm.org
> Subject: Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams
>
> On Fri, Nov 08, 2024 at 08:51:31AM -0700, Keith Busch wrote:
> > On Fri, Nov 08, 2024 at 03:18:52PM +0100, Christoph Hellwig wrote:
> > > We're not really duplicating much. Writing sequential is pretty easy,
> > > and tracking reclaim units separately means you need another tracking
> > > data structure, and either that or the LBA one is always going to be
> > > badly fragmented if they aren't the same.
> >
> > You're getting fragmentation anyway, which is why you had to implement
> > gc. You're just shifting who gets to deal with it from the controller to
> > the host. The host is further from the media, so you're starting from a
> > disadvantage. The host gc implementation would have to be quite a bit
> > better to justify the link and memory usage necessary for the copies
> > (...queue a copy-offload discussion? oom?).
>
> But the filesystem knows which blocks are actually in use. Sending
> TRIM/DISCARD information to the drive at block-level granularity hasn't
> worked out so well in the past. So the drive is the one at a disadvantage
> because it has to copy blocks which aren't actually in use.
It is true that trim has not been great. I would say that at least enterprise
SSDs have fixed this in general. For FDP, DSM Deallocate is respected, which
Provides a good "erase" interface to the host.
It is true though that this is not properly described in the spec and we should
fix it.
>
> I like the idea of using copy-offload though.
We have been iterating in the patches for years, but it is unfortunately
one of these series that go in circles forever. I don't think it is due
to any specific problem, but mostly due to unaligned requests form
different folks reviewing. Last time I talked to Damien he asked me to
send the patches again; we have not followed through due to bandwidth.
If there is an interest, we can re-spin this again...
More information about the Linux-nvme
mailing list