[PATCH net-next] net: axienet: Use NAPI for TX completion path

Thu May 5 13:15:26 PDT 2022

On Thu, 2022-05-05 at 12:56 -0600, Robert Hancock wrote:
> On Thu, 2022-05-05 at 11:08 -0700, Jakub Kicinski wrote:
> > On Thu, 5 May 2022 17:33:39 +0000 Robert Hancock wrote:
> > > On Wed, 2022-05-04 at 19:20 -0700, Jakub Kicinski wrote:
> > > > On Mon, 2 May 2022 19:30:51 +0000 Radhey Shyam Pandey wrote:  
> > > > > Thanks for the patch. I assume for simulating heavy network load we
> > > > > are using netperf/iperf. Do we have some details on the benchmark
> > > > > before and after adding TX NAPI? I want to see the impact on
> > > > > throughput.  
> > > > 
> > > > Seems like a reasonable ask, let's get the patch reposted 
> > > > with the numbers in the commit message.  
> > > 
> > > Didn't mean to ignore that request, looks like I didn't get Radhey's
> > > email
> > > directly, odd.
> > > 
> > > I did a test with iperf3 from the board (Xilinx MPSoC ZU9EG platform)
> > > connected
> > > to a Linux PC via a switch at 1G link speed. With TX NAPI in place I saw
> > > about
> > > 942 Mbps for TX rate, with the previous code I saw 941 Mbps. RX speed was
> > > also
> > > unchanged at 941 Mbps. So no real significant change either way. I can
> > > spin
> > > another version of the patch that includes these numbers.
> > 
> > Sounds like line rate, is there a difference in CPU utilization?
> 
> Some measurements on that from the TX load case - in both cases the RX and TX
> IRQs ended up being split across CPU0 and CPU3 due to irqbalance:
> 
> Before:
> 
> CPU0 (RX): 1% hard IRQ, 13% soft IRQ
> CPU3 (TX): 12% hard IRQ, 30% soft IRQ
> 
> After:
> 
> CPU0 (RX): <1% hard IRQ, 29% soft IRQ
> CPU3 (TX): <1% hard IRQ, 21% soft IRQ
> 
> The hard IRQ time is definitely lower, and the total CPU usage is lower as
> well
> (56% down to 50%). It's interesting that so much of the CPU load ended up on
> the CPU with the RX IRQ though, presumably because the RX and TX IRQs are
> triggering the same NAPI poll operation. Since they're separate IRQs that can
> be on separate CPUs, it might be a win to use separate NAPI poll structures
> for RX and TX so that both CPUs aren't trying to hit the same rings (TX and
> RX)?

Indeed, it appears that separate RX and TX NAPI polling lowers the CPU usage
overall by a few percent as well as keeping the TX work on the same CPU as the
TX IRQ. I'll submit a v3 with these changes and will include the softirq
numbers in the commit text.

> 
-- 
Robert Hancock
Senior Hardware Designer, Calian Advanced Technologies
www.calian.com