Memory providers multiplexing (Was: [PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag)

Christian König christian.koenig at amd.com
Wed Jul 12 00:55:51 PDT 2023


Am 12.07.23 um 05:42 schrieb Mina Almasry:
> On Tue, Jul 11, 2023 at 2:39 PM David Ahern <dsahern at kernel.org> wrote:
>> On 7/11/23 2:39 PM, Jakub Kicinski wrote:
>>> On Tue, 11 Jul 2023 10:06:28 -0700 Mina Almasry wrote:
>>>>>> Any reason not to allow an alternative representation for skb frags than
>>>>>> struct page?
>>>>> I don't think there's a hard technical reason. We can make it work.
>>>> I also think we can switch the representation for skb frags to
>>>> something else. However - please do correct me if I'm wrong - I don't
>>>> think that is sufficient for device memory TCP. My understanding is
>>>> that we also need to modify any NIC drivers that want to use device
>>>> memory TCP to understand a new memory type, and the page pool as well
>>>> if that's involved. I think in particular modifying the memory type in
>>>> all the NIC drivers that want to do device memory TCP is difficult. Do
>>>> you think this is feasible?
>>> That's why I was thinking about adding an abstraction between
>>> the page pool and the driver. Instead of feeding driver pages
>>> a new abstraction could feed the driver just an identifier and a PA.
>> skb frag is currently a bio_vec. Overloading the 'struct page' address
>> in that struct with another address is easy to do. Requiring a certain
>> alignment on the address gives you a few low bits to use a flags / magic
>> / etc.
>>
>> Overloading len and offset is not really possible - way too much code is
>> affected (e.g., iov walking and MSS / TSO segmenting).
>>
>> ie., you could overload page address with a pointer to an object in your
>> new abstraction layer and the struct has the other meta data.
>>
>> typedef struct skb_frag {
>>          union {
>>                  struct bio_vec bvec;
>>                  struct new_abstraction abs;
>>          };
>> } skb_frag_t;
>>
>> where
>>
>> struct new_abstraction {
>>          void *addr,
>>          unsigned int len;
>>          unsigned int offset;
>> };
>>
>> I have been playing with a similar and it co-exists with the existing
>> code quite well with the constraint on location of len and offset.
>>
>>> Whether we want to support fragmentation in that model or not would
>>> have to be decided.
>>>
>>> We can take pages from the page pool and feed them to drivers via
>>> such an API, but drivers need to stop expecting pages.
>> yes, drivers would have to be updated to understand the new format. A
>> downside, but again relatively easy to manage.
>>
> I'm glad to see that you're open to this approach. As far as I
> understand, getting device memory in a struct page form would still be
> preferred, no? And the approach you point to would be a backup plan I
> presume?

Well yes and no, if you need struct pages depends on what you want to do.

struct pages are an approach to manage memory regions, but P2PDMA is 
essentially about pumping data between devices.

If your data is for example organized in files on a filesystem then 
having struct pages is a must have because you need to be able to manage 
references, address spaces and so one.

If a pages was acquired with alloc_pages() in a driver then you have 
struct pages, but you can't necessary use them the way you want to use 
them because the first few dwords have different meaning depending on 
the use case.

And then you have the use case where you have for example micro 
controllers using P2PDMA to talk with each other. In this case your DMA 
address might not be memory at all, but rather MMIO. E.g. doorbells on 
graphics hw is such a case as well as cameras and encoders which 
communicate directly with each other.

> Since the good folks on this thread have pointed me to p2pdma to
> address my use case, I've been doing some homework to see if it can
> apply. AFACT so far, it applies, and Willem actually had a prototype
> of it working a while back. The rough approach Willem and I are
> thinking of would be something like:
>
> 1. The device memory driver would be the p2pdma provider. It would
> expose a user API which allocates a device memory region, calls
> pci_p2pdma_add_resource() and pci_p2pmem_publish() on it, and returns
> a reference to it to the userspace.
>
> 2. The NIC driver would be the p2pdma client and orchestrator. It
> would expose a user API which binds an rxq to a pci device. Prior to
> the bind the user API would check that the pci device has published
> p2p memory (pci_has_p2pmem()), and check the the p2p mem is accessible
> to the driver (pci_p2pdma_distance() I think), etc.
>
> 3. The NIC would allocate pages from the p2pdma provider for incoming
> packets, and create devmem skbs, and deliver the devmem skbs to the
> user using the support in my RFC. AFACT all that code need not be
> changed.
>
> AFAICT, all the concerns brought up in this thread are sidestepped by
> using p2pdma. I need not allocate struct pages in the core dma-buf
> code anymore (or anywhere), and I need not allocate pgmaps. I would
> just re-use the p2pdma support.
>
> Anyone see any glaring issues with this approach? I plan on trying to
> implement a PoC and sending an RFC v2.

Well we already have DMA-buf as user API for this use case, which is 
perfectly supported by RDMA if I'm not completely mistaken.

So what problem do you try to solve here actually?

Regards,
Christian.

>
> The only pending concern is integration with the page pool, but we
> already have some ideas on how to solve that.
>
>>> That's for data buffers only, obviously. We can keep using pages
>>> and raw page pool for headers.
>> yes.
>
>




More information about the linux-arm-kernel mailing list