[PATCH net-next v6 04/18] mm: Make the page_frag_cache allocator use per-cpu

Wed Apr 12 16:12:40 PDT 2023

Christoph Hellwig <hch at infradead.org> wrote:

> On Tue, Apr 11, 2023 at 05:08:48PM +0100, David Howells wrote:
> > Make the page_frag_cache allocator have a separate allocation bucket for
> > each cpu to avoid racing.  This means that no lock is required, other than
> > preempt disablement, to allocate from it, though if a softirq wants to
> > access it, then softirq disablement will need to be added.
> ...
> Let me ask a third time as I've not got an answer the last two times:

Sorry about that.  I think the problem is that the copy of the message from
you directly to me arrives after the first copy that comes via a mailing list
and google then deletes the direct one - as obviously no one could possibly
want duplicates, right? :-/ - and so you usually get consigned to the
linux-kernel or linux-fsdevel mailing list folder.

> > Make the NVMe, mediatek and GVE drivers pass in NULL to page_frag_cache()
> > and use the default allocation buckets rather than defining their own.
> 
> why are these callers treated different from the others?

There are only four users of struct page_frag_cache, the one these patches
modify::

 (1) GVE.
 (2) Mediatek.
 (3) NVMe.
 (4) skbuff.

Note that things are slightly confused by there being three very similarly
named frag allocators (page_frag and page_frag_1k in addition to
page_frag_cache) and the __page_frag_cache_drain() function gets used for
things other than just page_frag_cache.

I've replaced the single allocation buckets with per-cpu allocation buckets
for (1), (2) and (3) so that no locking[*] is required other than pinning it
to the cpu temporarily - but I can't test them as I don't have hardware.

[*] Note that what's upstream doesn't have locking, and I'm not sure all the
    users of it are SMP-safe.

That leaves (4).

Upstream, skbuff.c creates two separate per-cpu frag caches and I've elected
to retain that, except that the per-cpu bits are now inside the frag allocator
as I'm not entirely sure of the reason that there's a separate napi frag cache
to the netdev_alloc_cache.

The general page_frag_cache allocator is used by skb_splice_from_iter() if it
encounters a page it can't take a ref on, so it has been tested through that
using sunrpc, sunrpc+siw and cifs+siw.

> Can you show any performance numbers?

As far as I can tell, it doesn't make any obvious difference to directly
pumping data through TCP or TLS over TCP or transferring data over a network
filesystem such as sunrpc or cifs using siw/TCP.  I've tested this between two
machines over a 1G and a 10G link.

I can generate some actual numbers tomorrow.

Actually, I probably can drop these patches 2-4 from this patchset and just
use the netdev_alloc_cache in skb_splice_from_iter() for now.  Since that
copies unspliceable data, I no longer need to allocate frags in the next layer
up.

David