[PATCH net-next] net: ti: icssg_prueth: Add SW TX / RX Coalescing based on hrtimers
MD Danish Anwar
danishanwar at ti.com
Wed Apr 24 23:43:58 PDT 2024
Hi Andrew,
On 24/04/24 6:01 pm, Andrew Lunn wrote:
> On Wed, Apr 24, 2024 at 02:48:23PM +0530, MD Danish Anwar wrote:
>> Add SW IRQ coalescing based on hrtimers for RX and TX data path for ICSSG
>> driver, which can be enabled by ethtool commands:
>>
>> - RX coalescing
>> ethtool -C eth1 rx-usecs 50
>>
>> - TX coalescing can be enabled per TX queue
>>
>> - by default enables coalesing for TX0
>> ethtool -C eth1 tx-usecs 50
>> - configure TX0
>> ethtool -Q eth0 queue_mask 1 --coalesce tx-usecs 100
>> - configure TX1
>> ethtool -Q eth0 queue_mask 2 --coalesce tx-usecs 100
>> - configure TX0 and TX1
>> ethtool -Q eth0 queue_mask 3 --coalesce tx-usecs 100 --coalesce
>> tx-usecs 100
>>
>> Minimum value for both rx-usecs and tx-usecs is 20us.
>
> Do you have some benchmark numbers?
>
> Did you see this patch on the mailing list:
>
> https://lore.kernel.org/all/20240415094804.8016-1-paul.barker.ct@bp.renesas.com/T/#md50cb07bbdd6daf985f3796508cf4b246b085268
>
> This is basically a one line change, which brings big performance
> gains. Did you try something as simple as that, rather than all your
> hrtimer code?
>
I did some benchmarking today with,
1. Default driver (without any IRQ coalescing enabled)
2. IRQ Coalescing (With this patch)
3. Default IRQ Coalescing (Suggested by you in the above patch)
I have pasted the full logs at [1].
Below are the final numbers,
==============================================================
Method | Tput_TX | CPU_TX | Tput_RX | CPU_RX |
==============================================================
Default Driver 943 Mbps 31% 517 Mbps 38% |
IRQ Coalescing (Patch) 943 Mbps 28% 518 Mbps 25% |
Default IRQ Coalescing 942 Mbps 32% 521 Mbps 25% |
==============================================================
I see that the performance number is more or less same for all three
methods only the CPU load seems to be varying. The IRQ coalescing patch
(using hrtimer) seems to improve the cpu load by 3-4% in TX and 13% in
RX. Whereas the default method that you have suggested doesn't give any
improvemnet in tx however cpu load improves in RX with the same amount
as method 2.
Please let me know if this patch is OK to you based on the benchmarking?
[1]
https://gist.githubusercontent.com/danish-ti/47855631be9f3635cee994693662a988/raw/94b4eb86b42fe243ab03186a88a314e0cb272fd0/gistfile1.txt
> Andrew
--
Thanks and Regards,
Danish
More information about the linux-arm-kernel
mailing list