[LSF/MM/BPF TOPIC] dmabuf backed read/write

Pavel Begunkov asml.silence at gmail.com
Fri Feb 6 09:57:14 PST 2026


On 2/6/26 15:20, Jason Gunthorpe wrote:
> On Fri, Feb 06, 2026 at 03:08:25PM +0000, Pavel Begunkov wrote:
>> On 2/5/26 23:56, Jason Gunthorpe wrote:
>>> On Thu, Feb 05, 2026 at 07:06:03PM +0000, Pavel Begunkov wrote:
>>>> On 2/5/26 17:41, Jason Gunthorpe wrote:
>>>>> On Tue, Feb 03, 2026 at 02:29:55PM +0000, Pavel Begunkov wrote:
>>>>>
>>>>>> The proposal consists of two parts. The first is a small in-kernel
>>>>>> framework that allows a dma-buf to be registered against a given file
>>>>>> and returns an object representing a DMA mapping.
>>>>>
>>>>> What is this about and why would you need something like this?
>>>>>
>>>>> The rest makes more sense - pass a DMABUF (or even memfd) to iouring
>>>>> and pre-setup the DMA mapping to get dma_addr_t, then directly use
>>>>> dma_addr_t through the entire block stack right into the eventual
>>>>> driver.
>>>>
>>>> That's more or less what I tried to do in v1, but 1) people didn't like
>>>> the idea of passing raw dma addresses directly, and having it wrapped
>>>> into a black box gives more flexibility like potentially supporting
>>>> multi-device filesystems.
>>>
>>> Ok.. but what does that have to do with a user space visible file?
>>
>> If you're referring to registration taking a file, it's used to forward
>> this registration to the right driver, which knows about devices and can
>> create dma-buf attachment[s]. The abstraction users get is not just a
>> buffer but rather a buffer registered for a "subsystem" represented by
>> the passed file. With nvme raw bdev as the only importer in the patch set,
>> it's simply converges to "registered for the file", but the notion will
>> need to be expanded later, e.g. to accommodate filesystems.
> 
> Sounds completely goofy to me.

Hmm... the discussion is not going to be productive, isn't it?

> A wrapper around DMABUF that lets you
> attach to DMABUFs? Huh?

I have no idea what you mean and what "attach to DMABUFs" is.
dma-buf is passed to the driver, which attaches it (as in
calls dma_buf_dynamic_attach()).

> I feel like io uring should be dealing with this internally somehow not
> creating more and more uapi..

uapi changes are already minimal and outside of the IO path.

> The longer term goal has been to get page * out of the io stack and
> start using phys_addr_t, if we could pass the DMABUF's MMIO as a

Except that I already tried passing device mapped addresses directly,
and it was rejected because it won't be able to handle more complicated
cases like multi-device filesystems and probably for other reasons.
Or would it be mapping it for each IO?

> phys_addr_t around the IO stack then we only need to close the gap of
> getting the p2p provider into the final DMA mapping.
> 
> Alot of this has improved in the past few cycles where the main issue
> now is the carrying the provider and phys_addr_t through the io to the
> nvme driver. vs when you started this and even that fundamental
> infrastructure was missing.
> 
>>>>>> Tushar was helping and mention he got good numbers for P2P transfers
>>>>>> compared to bouncing it via RAM.
>>>>>
>>>>> We can already avoid the bouncing, it seems the main improvements here
>>>>> are avoiding the DMA map per-io and allowing the use of P2P without
>>>>> also creating struct page. Meanginful wins for sure.
>>>>
>>>> Yes, and it should probably be nicer for frameworks that already
>>>> expose dma-bufs.
>>>
>>> I'm not sure what this means?
>>
>> I'm saying that when a user app can easily get or already has a
>> dma-buf fd, it should be easier to just use it instead of finding
>> its way to FOLL_PCI_P2PDMA.
> 
> But that all exists already and this proposal does nothing to improve
> it..

dma-buf already exists as well, and I'm ashamed to admit,
but I don't know how a user program can read into / write from
memory provided by dma-buf.

>> I'm actually curious, is there a way to somehow create a
>> MEMORY_DEVICE_PCI_P2PDMA mapping out of a random dma-buf?
> 
> No. The driver owning the P2P MMIO has to do this during its probe and
> then it has to provide a VMA with normal pages so GUP works. This is
> usally not hard on the exporting driver side.
> 
> It costs some memory but then everything works naturally in the IO
> stack.
> 
> Your project is interesting and would be a nice improvement, but I
> also don't entirely understand why you are bothering when the P2PDMA
> solution is already fully there ready to go... Is something preventing
> you from creating the P2PDMA pages for your exporting driver?

I'm not doing it for any particular driver but rather trying
to reuse what's already there, i.e. a good coverage of existing
dma-buf exporters, and infrastructure dma-buf provides, e.g.
move_notify. And trying to do that efficiently, avoiding GUP
(what io_uring can already do for normal memory), keeping long
term mappings (modulo move_notify), and so. That includes
optimising the cost of system memory rw with iommu.

-- 
Pavel Begunkov




More information about the Linux-nvme mailing list