[LSF/MM/BPF TOPIC] dmabuf backed read/write
Pavel Begunkov
asml.silence at gmail.com
Mon Feb 9 03:15:22 PST 2026
On 2/4/26 15:26, Nitesh Shetty wrote:
> On 03/02/26 02:29PM, Pavel Begunkov wrote:
>> Good day everyone,
>>
>> dma-buf is a powerful abstraction for managing buffers and DMA mappings,
>> and there is growing interest in extending it to the read/write path to
>> enable device-to-device transfers without bouncing data through system
>> memory. I was encouraged to submit it to LSF/MM/BPF as that might be
>> useful to mull over details and what capabilities and features people
>> may need.
>>
>> The proposal consists of two parts. The first is a small in-kernel
>> framework that allows a dma-buf to be registered against a given file
>> and returns an object representing a DMA mapping. The actual mapping
>> creation is delegated to the target subsystem (e.g. NVMe). This
>> abstraction centralises request accounting, mapping management, dynamic
>> recreation, etc. The resulting mapping object is passed through the I/O
>> stack via a new iov_iter type.
>>
>> As for the user API, a dma-buf is installed as an io_uring registered
>> buffer for a specific file. Once registered, the buffer can be used by
>> read / write io_uring requests as normal. io_uring will enforce that the
>> buffer is only used with "compatible files", which is for now restricted
>> to the target registration file, but will be expanded in the future.
>> Notably, io_uring is a consumer of the framework rather than a
>> dependency, and the infrastructure can be reused.
>>
> We have been following the series, its interesting from couple of angles,
> - IOPS wise we see a major improvement especially for IOMMU
> - Series provides a way to do p2pdma to accelerator memory
>
> Here are few topics which I am looking into specifically,
> - Right now the series uses a PRP list. We need a good way to keep the
> sg_table info around and decide on‑the‑fly whether to expose the buffer
> as a PRP list or an SG list, depending on the I/O size.
> - Possibility of futher optimization for new type of iov iter to reduce
> per IO cost
There is a bunch of improvements that we can have on the NVMe driver
side, just take a look what Keith was doing in his series ([2] in the
first email in the thread), that looked very exciting (I dropped it for
simplicity). I was planning to take a closer look at optimising the driver
part after, but if someone wants to take it off my hands, it'll definitely
be welcome!
--
Pavel Begunkov
More information about the Linux-nvme
mailing list