[PATCH net-next v4 4/5] page_pool: remove PP_FLAG_PAGE_FRAG flag
Jesper Dangaard Brouer
jbrouer at redhat.com
Fri Jun 16 11:59:12 PDT 2023
On 16/06/2023 17.01, Alexander Duyck wrote:
> On Fri, Jun 16, 2023 at 5:21 AM Yunsheng Lin <linyunsheng at huawei.com> wrote:
>>
>> On 2023/6/16 2:26, Alexander Duyck wrote:
>>> On Thu, Jun 15, 2023 at 9:51 AM Jakub Kicinski <kuba at kernel.org> wrote:
>>>>
>>>> On Thu, 15 Jun 2023 15:17:39 +0800 Yunsheng Lin wrote:
[...]
>>>>
>>>> I like your patches as they isolate the drivers from having to make the
>>>> fragmentation decisions based on the system page size (4k vs 64k but
>>>> we're hearing more and more about ARM w/ 16k pages). For that use case
>>>> this is great.
+1
[...]
>>>
>>> In the case of the standard page size being 4K a standard page would
>>> just have to take on the CPU overhead of the atomic_set and
>>> atomic_read for pp_ref_count (new name) which should be minimal as on
>>> most sane systems those just end up being a memory write and read.
>>
>> If I understand you correctly, I think what you are trying to do
>> may break some of Jesper' benchmarking:)
>>
>> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/bench_page_pool_simple.c
>
> So? If it breaks an out-of-tree benchmark the benchmark can always be
> fixed.
It doesn't matter if this is out-of-tree (I should have upstreamed it
when AKPM asked me to.)
Point is don't break my page_pool fast-path!!! :-P
> The point is enabling a use case that can add value across the
> board instead of trying to force the community to support a niche use
> case.
I'm all for creating a new API, lets call it netmem, that takes care of
this use-case.
I'm *not* okay with this new API slowing down the page_pool fast-path.
Why not multiplex on a MEM_TYPE, like XDP_MEM_TYPE is prepared for?!?
Meaning the caller can choose which is the correct API call.
(thus, we can stay away from adding code to fast-path case)
See below, copy-paste of code that shows what I mean by multiplex on a
MEM_TYPE.
>
> Ideally we should get away from using the pages directly for most
> cases in page pool. In my mind the page pool should start operating
> more like __get_free_pages where what you get is a virtual address
> instead of the actual page. That way we could start abstracting it
> away and eventually get to something more like a true page_pool api
> instead of what feels like a set of add-ons for the page allocator.
Yes, I agree with Alex Duyck here.
Like when I looked at veth proposed changes, it also felt like a virtual
address would be better than a page.
addr = netmem_alloc(rq->page_pool, &truesize);
> Although at the end of the day this still feels more like we are just
> reimplementing slab so it is hard for me to say this is necessarily
> the best solution either.
Yes, we have to be careful not to re-implement the MM layer in network
land ;-)
(below code copy-paste broke whitespaces)
$ git show
commit fe38c642d629f8361f76b25aa8732e5e331d0925 (HEAD -> pp_rm_workqueue04)
Author: Jesper Dangaard Brouer <brouer at redhat.com>
Date: Fri Jun 16 20:54:08 2023 +0200
page_pool: code examplifying multiplexing on mem_type
Signed-off-by: Jesper Dangaard Brouer <brouer at redhat.com>
diff --git a/include/net/xdp.h b/include/net/xdp.h
index d1c5381fc95f..c02ac82a1d79 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -42,6 +42,7 @@ enum xdp_mem_type {
MEM_TYPE_PAGE_ORDER0, /* Orig XDP full page model */
MEM_TYPE_PAGE_POOL,
MEM_TYPE_XSK_BUFF_POOL,
+ MEM_TYPE_PP_NETMEM,
MEM_TYPE_MAX,
};
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d03448a4c411..68be76efef00 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -353,7 +353,7 @@ static void page_pool_set_pp_info(struct page_pool
*pool,
struct page *page)
{
page->pp = pool;
- page->pp_magic |= PP_SIGNATURE;
+ page->pp_magic |= PP_SIGNATURE | (MEM_TYPE_PAGE_POOL << 8);
if (pool->p.init_callback)
pool->p.init_callback(page, pool->p.init_arg);
}
@@ -981,6 +981,7 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
struct napi_struct *napi;
struct page_pool *pp;
bool allow_direct;
+ int mem_type;
page = compound_head(page);
@@ -991,9 +992,10 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
* and page_is_pfmemalloc() is checked in __page_pool_put_page()
* to avoid recycling the pfmemalloc page.
*/
- if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
+ if (unlikely((page->pp_magic & ~0xF03UL) != PP_SIGNATURE))
return false;
+ mem_type = (page->pp_magic & 0xF00) >> 8;
pp = page->pp;
/* Allow direct recycle if we have reasons to believe that we are
@@ -1009,7 +1011,10 @@ bool page_pool_return_skb_page(struct page *page,
bool napi_safe)
* The page will be returned to the pool here regardless of the
* 'flipped' fragment being in use or not.
*/
- page_pool_put_full_page(pp, page, allow_direct);
+ if (mem_type == MEM_TYPE_PP_NETMEM)
+ pp_netmem_put_page(pp, page, allow_direct);
+ else
+ page_pool_put_full_page(pp, page, allow_direct);
return true;
}
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 41e5ca8643ec..dc4bfbe8f002 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -380,6 +380,11 @@ void __xdp_return(void *data, struct xdp_mem_info
*mem, bool napi_direct,
struct page *page;
switch (mem->type) {
+ case MEM_TYPE_PP_NETMEM:
+ if (napi_direct && xdp_return_frame_no_direct())
+ napi_direct = false;
+ pp_netmem_put(page->pp, data, napi_direct);
+ break;
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())
More information about the Linux-mediatek
mailing list