[PATCH net-next] stmmac: align RX buffers

Mon Jun 14 16:21:07 PDT 2021

On Mon, 14 Jun 2021 12:51:11 -0700 (PDT)
David Miller <davem at davemloft.net> wrote:

> 
> But thois means the ethernet header will be misaliugned and this will
> kill performance on some cpus as misaligned accessed are resolved
> wioth a trap handler.
> 
> Even on cpus that don't trap, the access will be slower.
> 
> Thanks.

Isn't the IP header which should be aligned to avoid expensive traps?
From include/linux/skbuff.h:

 * Since an ethernet header is 14 bytes network drivers often end up with
 * the IP header at an unaligned offset. The IP header can be aligned by
 * shifting the start of the packet by 2 bytes. Drivers should do this
 * with:
 *
 * skb_reserve(skb, NET_IP_ALIGN);

But the problem here really is not the header alignment, the problem is
that the rx buffer is copied into an skb, and the two buffers have
different alignments.
If I add this print, I get this for every packet:

--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5460,6 +5460,8 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
+               printk("skb->data alignment: %lu\n", (uintptr_t)skb->data & 7);
+               printk("xdp.data alignment: %lu\n" , (uintptr_t)xdp.data & 7);
                skb_copy_to_linear_data(skb, xdp.data, buf1_len);

[ 1060.967768] skb->data alignment: 2
[ 1060.971174] xdp.data alignment: 0
[ 1061.967589] skb->data alignment: 2
[ 1061.970994] xdp.data alignment: 0

And many architectures do an optimized memcpy when the low order bits of the
two pointers match, to name a few:

arch/alpha/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

arch/xtensa/lib/memcopy.S:
	/*
	 * Destination and source are word-aligned, use word copy.
	 */
	# copy 16 bytes per iteration for word-aligned dst and word-aligned src

arch/openrisc/lib/memcpy.c:
	/* If both source and dest are word aligned copy words */
	if (!((unsigned int)dest_w & 3) && !((unsigned int)src_w & 3)) {

And so on. With my patch I (mis)align the two buffer at an offset 2
(NET_IP_ALIGN) so the data can be copied faster:

[   16.648485] skb->data alignment: 2
[   16.651894] xdp.data alignment: 2
[   16.714260] skb->data alignment: 2
[   16.717688] xdp.data alignment: 2

Does this make sense?

Regards,
-- 
per aspera ad upstream