[PATCH v2 0/2] RISC-V: Optimize memset for data sizes less than 16 bytes

Andrew Jones ajones at ventanamicro.com
Thu May 11 00:44:39 PDT 2023


On Thu, May 11, 2023 at 09:26:04AM +0800, zhangfei wrote:
> From: zhangfei <zhangfei at nj.iscas.ac.cn>
> 
> At present, the implementation of the memset function uses byte by byte storage 
> when processing tail data or when the initial data size is less than 16 bytes. 
> This approach is not efficient. Therefore, I filled head and tail with minimal 
> branching. Each conditional ensures that all the subsequently used offsets are 
> well-defined and in the dest region. Although this approach may result in 
> redundant storage, compared to byte by byte storage, it allows storage instructions 
> to be executed in parallel, reduces the number of jumps, and ultimately achieves 
> performance improvement.
> 
> I used the code linked below for performance testing and commented on the memset 
> that calls the arm architecture in the code to ensure it runs properly on the 
> risc-v platform.
> 
> [1] https://github.com/ARM-software/optimized-routines/blob/master/string/bench/memset.c#L53
> 
> The testing platform selected RISC-V SiFive U74.The test data is as follows:
> 
> Before optimization
> ---------------------
> Random memset (bytes/ns):
>            memset_call 32K:0.45 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.30
> 
> Medium memset (bytes/ns):
>            memset_call 8B:0.18 16B:0.48 32B:0.91 64B:1.63 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.46 8K:7.70 16K:7.82 32K:7.63 64K:1.40
> 
> After optimization
> ---------------------
> Random memset bytes/ns):
>            memset_call 32K:0.46 64K:0.35 128K:0.30 256K:0.28 512K:0.27 1024K:0.25 avg 0.31
> Medium memset (bytes/ns )
>            memset_call 8B:0.27 16B:0.48 32B:0.91 64B:1.64 128B:2.71 256B:4.40 512B:5.67
> Large memset (bytes/ns):
>            memset_call 1K:6.62 2K:7.02 4K:7.47 8K:7.71 16K:7.83 32K:7.63 64K:1.40
> 
> From the results, it can be seen that memset has significantly improved its performance with 
> a data volume of around 8B, from 0.18 bytes/ns to 0.27 bytes/ns.
> 
> The previous work was as follows:
> 1. "[PATCH] riscv: Optimize memset"
>    6d1cbe2e.3c31d.187eb14d990.Coremail.zhangfei at nj.iscas.ac.cn

Cover letters should have a changelog, in this case a couple phrases
stating what's different in v2 vs. v1.

Thanks,
drew

> 
> Thanks,
> Fei Zhang
> 
> Andrew Jones (1):
>   RISC-V: lib: Improve memset assembler formatting
> 
>  arch/riscv/lib/memset.S | 143 ++++++++++++++++++++--------------------
>  1 file changed, 72 insertions(+), 71 deletions(-)
> 
> zhangfei (1):
>   RISC-V: lib: Optimize memset performance
> 
>  arch/riscv/lib/memset.S | 40 +++++++++++++++++++++++++++++++++++++---
>  1 file changed, 37 insertions(+), 3 deletions(-)
> 



More information about the linux-riscv mailing list