[PATCH v5] arm64: Implement optimised checksum routine

Shaokun Zhang zhangshaokun at hisilicon.com
Thu Jan 16 05:59:30 PST 2020


Hi Will,

On 2020/1/16 18:55, Will Deacon wrote:
> On Wed, Jan 15, 2020 at 04:42:39PM +0000, Robin Murphy wrote:
>> Apparently there exist certain workloads which rely heavily on software
>> checksumming, for which the generic do_csum() implementation becomes a
>> significant bottleneck. Therefore let's give arm64 its own optimised
>> version - for ease of maintenance this foregoes assembly or intrisics,
>> and is thus not actually arm64-specific, but does rely heavily on C
>> idioms that translate well to the A64 ISA and the typical load/store
>> capabilities of most ARMv8 CPU cores.
>>
>> The resulting increase in checksum throughput scales nicely with buffer
>> size, tending towards 4x for a small in-order core (Cortex-A53), and up
>> to 6x or more for an aggressive big core (Ampere eMAG).
>>
>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>>
>> ---
>>
>> I rigged up a simple userspace test to run the generic and new code for
>> various buffer lengths at aligned and unaligned offsets; data is average
>> runtime in nanoseconds.
> 
> Shaokun, Yuke -- please can you give this a spin and let us know how it
> works for you? If it looks good, then I can queue it up today/tomorrow.
> 

Lingyan has tested this patch, the result is as follow:
1000loop  general(ns)     csum_hly_128B.c(ns) csum_robin_v5.s(ns)
   64B:      48510               40730          37440
  256B:     104180               59330          50210
 1023B:     328580              124600          89960
 1024B:     327880              125300          88520
 1500B:     466440              165090         113560
 2048B:     632060              212470         158320
 4095B:    1219850              393080         263940
 4096B:    1222740              399200         262550

It's better than Lingyan's patch v4, Thanks for Robin's work.

If you are happy, please feel free to add:
Reported-by: Lingyan Huang <huanglingyan2 at huawei.com>
Tested-by: Lingyan Huang <huanglingyan2 at huawei.com>

Thanks,
Shaokun

> Thanks,
> 
> Will
> 
> .
> 




More information about the linux-arm-kernel mailing list