[PATCH v5] arm64: Implement optimised checksum routine

Will Deacon will at kernel.org
Thu Jan 16 02:55:35 PST 2020


On Wed, Jan 15, 2020 at 04:42:39PM +0000, Robin Murphy wrote:
> Apparently there exist certain workloads which rely heavily on software
> checksumming, for which the generic do_csum() implementation becomes a
> significant bottleneck. Therefore let's give arm64 its own optimised
> version - for ease of maintenance this foregoes assembly or intrisics,
> and is thus not actually arm64-specific, but does rely heavily on C
> idioms that translate well to the A64 ISA and the typical load/store
> capabilities of most ARMv8 CPU cores.
> 
> The resulting increase in checksum throughput scales nicely with buffer
> size, tending towards 4x for a small in-order core (Cortex-A53), and up
> to 6x or more for an aggressive big core (Ampere eMAG).
> 
> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
> 
> ---
> 
> I rigged up a simple userspace test to run the generic and new code for
> various buffer lengths at aligned and unaligned offsets; data is average
> runtime in nanoseconds.

Shaokun, Yuke -- please can you give this a spin and let us know how it
works for you? If it looks good, then I can queue it up today/tomorrow.

Thanks,

Will



More information about the linux-arm-kernel mailing list