[LSF/MM/BPF TOPIC] State Of The Page

Matthew Wilcox willy at infradead.org
Sat Jan 27 10:43:37 PST 2024


On Sat, Jan 27, 2024 at 12:57:45PM -0500, Kent Overstreet wrote:
> On Fri, Jan 19, 2024 at 04:24:29PM +0000, Matthew Wilcox wrote:
> >  - What are we going to do about bio_vecs?
> 
> For bios and biovecs, I think it's important to keep in mind the
> distinction between the code that owns and submits the bio, and the
> consumer underneath.
> 
> The code underneath could just as easily work with pfns, and the code
> above got those pages from somewhere else, so it doesn't _need_ the bio
> for access to those pages/folios (it would be a lot of refactoring
> though).
> 
> But I've been thinking about going in a different direction - what if we
> unified iov_iter and bio? We've got ~3 different scatter-gather types
> that an IO passes through down the stack, and it would be lovely if we
> could get it down to just one; e.g. for DIO, pinning pages right at the
> copy_from_user boundary.

Yes, but ...

One of the things that Xen can do and Linux can't is I/O to/from memory
that doesn't have an associated struct page.  We have all kinds of hacks
in place to get around that right now, and I'd like to remove those.

Since we want that kind of memory (lets take, eg, GPU memory as an
example) to be mappable to userspace, and we want to be able to do DIO
to that memory, that points us to using a non-page-based structure right
from the start.  Yes, if it happens to be backed by pages we need to 'pin'
them in some way (I'd like to get away from per-page or even per-folio
pinning, but we'll see about that), but the data structure that we use
to represent that memory as it moves through the I/O subsystem needs to
be physical address based.

So my 40,000 foot view is that we do something like get_user_phyrs()
at the start of DIO, pas the phyr to the filesystem; the filesystem then
passes one or more phyrs to the block layer, the block layer gives the
phyrs to the driver which DMA maps the phyr.

Yes, the IO completion path (for buffered IO) needs to figure out which
folios are decsribed by this phyr, but that's a phys_to_folio() call away.



More information about the Linux-nvme mailing list