[PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag

Jesper Dangaard Brouer jbrouer at redhat.com
Fri Jun 16 11:59:12 PDT 2023



On 16/06/2023 17.01, Alexander Duyck wrote:
> On Fri, Jun 16, 2023 at 5:21 AM Yunsheng Lin <linyunsheng at huawei.com> wrote:
>>
>> On 2023/6/16 2:26, Alexander Duyck wrote:
>>> On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba at kernel.org> wrote:
>>>>
>>>> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
[...]
>>>>
>>>> I like your patches as they isolate the drivers from having to make the
>>>> fragmentation decisions based on the system page size (4k vs 64k but
>>>> we're hearing more and more about ARM w/ 16k pages). For that use case
>>>> this is great.

+1

[...]
>>>
>>> In the case of the standard page size being 4K a standard page would
>>> just have to take on the CPU overhead of the atomic_set and
>>> atomic_read for pp_ref_count (new name) which should be minimal as on
>>> most sane systems those just end up being a memory write and read.
>>
>> If I understand you correctly, I think what you are trying to do
>> may break some of Jesper' benchmarking:)
>>
>> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
> 
> So? If it breaks an out-of-tree benchmark the benchmark can always be
> fixed. 

It doesn't matter if this is out-of-tree (I should have upstreamed it 
when AKPM asked me to.)

Point is don't break my page_pool fast-path!!! :-P

> The point is enabling a use case that can add value across the
> board instead of trying to force the community to support a niche use
> case.

I'm all for creating a new API, lets call it netmem, that takes care of 
this use-case.
I'm *not* okay with this new API slowing down the page_pool fast-path.

Why not multiplex on a MEM_TYPE, like XDP_MEM_TYPE is prepared for?!?
Meaning the caller can choose which is the correct API call.
(thus, we can stay away from adding code to fast-path case)

See below, copy-paste of code that shows what I mean by multiplex on a 
MEM_TYPE.

> 
> Ideally we should get away from using the pages directly for most
> cases in page pool. In my mind the page pool should start operating
> more like __get_free_pages where what you get is a virtual address
> instead of the actual page. That way we could start abstracting it
> away and eventually get to something more like a true page_pool api
> instead of what feels like a set of add-ons for the page allocator.

Yes, I agree with Alex Duyck here.
Like when I looked at veth proposed changes, it also felt like a virtual 
address would be better than a page.

  addr = netmem_alloc(rq->page_pool, &truesize);

> Although at the end of the day this still feels more like we are just
> reimplementing slab so it is hard for me to say this is necessarily
> the best solution either.

Yes, we have to be careful not to re-implement the MM layer in network 
land ;-)

(below code copy-paste broke whitespaces)

$ git show
commit fe38c642d629f8361f76b25aa8732e5e331d0925 (HEAD -> pp_rm_workqueue04)
Author: Jesper Dangaard Brouer <brouer at redhat.com>
Date:   Fri Jun 16 20:54:08 2023 +0200

     page_pool: code examplifying multiplexing on mem_type

     Signed-off-by: Jesper Dangaard Brouer <brouer at redhat.com>

diff --git a/include/net/xdp.h b/include/net/xdp.h
index d1c5381fc95f..c02ac82a1d79 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -42,6 +42,7 @@ enum xdp_mem_type {
         MEM_TYPE_PAGE_ORDER0,     /* Orig XDP full page model */
         MEM_TYPE_PAGE_POOL,
         MEM_TYPE_XSK_BUFF_POOL,
+       MEM_TYPE_PP_NETMEM,
         MEM_TYPE_MAX,
  };

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d03448a4c411..68be76efef00 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -353,7 +353,7 @@ static void page_pool_set_pp_info(struct page_pool 
*pool,
                                   struct page *page)
  {
         page->pp = pool;
-       page->pp_magic |= PP_SIGNATURE;
+       page->pp_magic |= PP_SIGNATURE | (MEM_TYPE_PAGE_POOL << 8);
         if (pool->p.init_callback)
                 pool->p.init_callback(page, pool->p.init_arg);
  }
@@ -981,6 +981,7 @@ bool page_pool_return_skb_page(struct page *page, 
bool napi_safe)
         struct napi_struct *napi;
         struct page_pool *pp;
         bool allow_direct;
+       int mem_type;

         page = compound_head(page);

@@ -991,9 +992,10 @@ bool page_pool_return_skb_page(struct page *page, 
bool napi_safe)
          * and page_is_pfmemalloc() is checked in __page_pool_put_page()
          * to avoid recycling the pfmemalloc page.
          */
-       if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
+       if (unlikely((page->pp_magic & ~0xF03UL) != PP_SIGNATURE))
                 return false;

+       mem_type = (page->pp_magic & 0xF00) >> 8;
         pp = page->pp;

         /* Allow direct recycle if we have reasons to believe that we are
@@ -1009,7 +1011,10 @@ bool page_pool_return_skb_page(struct page *page, 
bool napi_safe)
          * The page will be returned to the pool here regardless of the
          * 'flipped' fragment being in use or not.
          */
-       page_pool_put_full_page(pp, page, allow_direct);
+       if (mem_type == MEM_TYPE_PP_NETMEM)
+               pp_netmem_put_page(pp, page, allow_direct);
+       else
+               page_pool_put_full_page(pp, page, allow_direct);

         return true;
  }
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 41e5ca8643ec..dc4bfbe8f002 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -380,6 +380,11 @@ void __xdp_return(void *data, struct xdp_mem_info 
*mem, bool napi_direct,
         struct page *page;

         switch (mem->type) {
+       case MEM_TYPE_PP_NETMEM:
+               if (napi_direct && xdp_return_frame_no_direct())
+                       napi_direct = false;
+               pp_netmem_put(page->pp, data, napi_direct);
+               break;
         case MEM_TYPE_PAGE_POOL:
                 page = virt_to_head_page(data);
                 if (napi_direct && xdp_return_frame_no_direct())




More information about the Linux-mediatek mailing list