riscv syscall performance regression

Tue Aug 13 12:15:41 PDT 2024

On Tue, Aug 13, 2024 at 02:51:09PM +0200, Alexandre Ghiti wrote:
> Hi Fei,
> 
> On 23/02/2024 06:28, Wu, Fei wrote:
> > Hi All,
> > 
> > I am doing some performance regression testing on a sophgo machine, the
> > unixbench syscall benchmark drops 14% from 6.1 to 6.6. This change
> > should be due to commit f0bddf50 riscv: entry: Convert to generic entry.
> > I know it's a tradeoff, just checking if it's been discussed already and
> > any improvement can be done.
> > 
> > The unixbench benchmark I used is:
> > 	$ ./syscall 10 getpid
> > 
> > The dynamic instruction count per syscall is increased from ~200 to
> > ~250, this should be the key factor so I switch to test it on system
> > QEMU to avoid porting different versions on sophgo, and use plugin
> > libinsn.so to count the instructions. There are a few background noises
> > during test but the impact should be limited. This is dyninst count per
> > syscall I got:
> > 
> > * commit d0db02c6 (right before the change): ~200
> > * commit f0bddf50 (the change): ~250
> > * commit ffd2cb6b (latest upstream): ~250
> > 
> > Any comment?
> > 
> > Thanks,
> > Fei.
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> 
> So I finally took some time to look into this. Indeed the conversion to the
> generic entry introduced the overhead you observe.
> 
> The numbers I get are similar:
> 
> * commit d0db02c6 (right before the change): 185
> 
> *  6.11-rc3: 245
> 
> I dived a bit deeper and noticed that we could regain ~40 instructions by
> inlining syscall_exit_to_user_mode() and do_trap_ecall_u():
> 
> - we used to intercept the syscall trap but now it's dealt with in the
> exception vector, not sure if we can inline do_trap_ecall_u()
> - I quickly tried to inline syscall_exit_to_user_mode() but it pulls quite a
> few functions and I failed to do so.
> 
> Note that a recent effort already inlined most of the common entry functions
> already
> https://lore.kernel.org/all/20231218074520.1998026-1-svens@linux.ibm.com/
> 
> The remaining instructions are caused by:
> 
> * the vector extension handling. It won't improve the above numbers because
> the test does not use the vector extension, but we could improve
> __riscv_v_vstate_discard() as mentioned in commit 9657e9b7d253 ("riscv:
> Discard vector state on syscalls")
> * the random kernel stack offset
> 
> I'll add some performance regressions in my CI in the near future :)
> 
> Thanks,
> 
> Alex

I have written patches to do this inlining but haven't sent it out yet.
I don't know a good way of showing performance improvement so I have
been hesistant to send it. It is generic so showing the improvement on
x86 is probably the best. I have also written some patches for cleaning
up some of the other syscall handling but again haven't been able to
show performance numbers. I was going to use a thead board but was
unable to get it to boot on an up-to-date kernel as I posted about here
[1]. The patches here [2] should also show improvements.

I can try to get some numbers and send out the patches.

Link: https://lore.kernel.org/linux-arm-kernel/ZoydV7vad5JWIcZb@ghost/
[1]
Link:
https://patchwork.kernel.org/project/linux-riscv/cover/20240720171232.1753-1-jszhang@kernel.org/
[2]

- Charlie

> 
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv