[PATCHv8 1/6] block, fs: restore kiocb based write hint processing

Keith Busch kbusch at kernel.org
Tue Oct 22 07:37:56 PDT 2024


On Tue, Oct 22, 2024 at 08:43:09AM +0200, Christoph Hellwig wrote:
> On Mon, Oct 21, 2024 at 09:47:47AM -0600, Keith Busch wrote:
> > On Fri, Oct 18, 2024 at 07:50:32AM +0200, Christoph Hellwig wrote:
> > > On Thu, Oct 17, 2024 at 09:09:32AM -0700, Keith Busch wrote:
> > > >  {
> > > >  	*kiocb = (struct kiocb) {
> > > >  		.ki_filp = filp,
> > > >  		.ki_flags = filp->f_iocb_flags,
> > > >  		.ki_ioprio = get_current_ioprio(),
> > > > +		.ki_write_hint = file_write_hint(filp),
> > > 
> > > And we'll need to distinguish between the per-inode and per file
> > > hint.  I.e. don't blindly initialize ki_write_hint to the per-inode
> > > one here, but make that conditional in the file operation.
> > 
> > Maybe someone wants to do direct-io with partions where each partition
> > has a different default "hint" when not provided a per-io hint? I don't
> > know of such a case, but it doesn't sound terrible. In any case, I feel
> > if you're directing writes through these interfaces, you get to keep all
> > the pieces: user space controls policy, kernel just provides the
> > mechanisms to do it.
> 
> Eww.  You actually pointed out a real problem here: if a device
> has multiple partitions the write streams as of this series are
> shared by them, which breaks their use case as the applications or
> file systems in different partitions will get other users of the
> write stream randomly overlayed onto theirs.
> 
> So either the available streams need to be split into smaller pools
> by partitions, or we just assigned them to the first partition to
> make these scheme work for partitioned devices.
> 
> Either way mixing up the per-inode hint and the dynamic one remains
> a bad idea.

No doubt it's almost certainly not a good idea to mix different stream
usages, but that's not the kernels problem. It's user space policy. I
don't think the kernel needs to perform any heroic efforts to split
anything here. Just keep it simple.



More information about the Linux-nvme mailing list