[PATCH] riscv: Optimize memset

zhangfei zhang_fei_0403 at 163.com
Wed May 10 18:42:43 PDT 2023


From: zhangfei <zhangfei at nj.iscas.ac.cn>

On Wed, May 10, 2023 at 14:58:22PM +0200, Andrew Jones wrote:
> On Wed, May 10, 2023 at 11:52:43AM +0800, zhangfei wrote:
> > From: zhangfei <zhangfei at nj.iscas.ac.cn>
> > 
> > On Tue, May 09, 2023 11:16:33AM +0200, Andrew Jones wrote: 
> > > On Tue, May 09, 2023 at 10:22:07AM +0800, zhangfei wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I filled head and tail with minimal branching. Each conditional ensures that 
> > > > all the subsequently used offsets are well-defined and in the dest region.
> > > 
> > > I know. You trimmed my comment, so I'll quote myself, here
> > > 
> > > """
> > > After the check of a2 against 6 above we know that offsets 6(t0)
> > > and -7(a3) are safe. Are we trying to avoid too may redundant
> > > stores with these additional checks?
> > > """
> > > 
> > > So, again. Why the additional check against 8 above and, the one you
> > > trimmed, checking 10?
> > 
> > Hi,
> > 
> > These additional checks are to avoid too many redundant stores. 
> > 
> > Adding a check for more than 8 bytes is because after the loop 
> > segment '3' comes out, the remaining bytes are less than 8 bytes, 
> > which also avoids redundant stores.
> 
> So the benchmarks showed these additional checks were necessary to avoid
> making memset worse? Please add comments to the code explaining the
> purpose of the checks.

Hi,

As you mentioned, the lack of these additional tests can make memset worse. 
When I removed the checks for 8 and 10 above, the benchmarks showed that the 
memset changed to 0.21 bytes/ns at 8B. Although this is better than storing 
byte by byte, additional detections will bring a better improvement to 0.27 bytes/ns.

Due to the chaotic response in my previous email, I am sorry for this. I have 
reorganized patch v2 and sent it to you. Please reply under the latest patch.

Thanks,
Fei Zhang




More information about the linux-riscv mailing list