[PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints

Radim Krčmář rkrcmar at ventanamicro.com
Wed Jun 25 00:51:45 PDT 2025


2025-06-24T15:09:09+02:00, Radim Krčmář <rkrcmar at ventanamicro.com>:
> For another example, let's have the following function:
>
>   struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
>   {
>     return sbi_ecall(123, 456, a0, a1);
>   }
>
> The disassembly without tracepoints (with -fno-omit-frame-pointer):
> (It could have been just "li;li;ecall;ret" without frame pointer.)
>
>    0xffffffff80016d48 <+0>:	addi	sp,sp,-16
>    0xffffffff80016d4a <+2>:	sd	ra,8(sp)
>    0xffffffff80016d4c <+4>:	sd	s0,0(sp)
>    0xffffffff80016d4e <+6>:	addi	s0,sp,16
>    0xffffffff80016d50 <+8>:	li	a7,123
>    0xffffffff80016d54 <+12>:	li	a6,456
>    0xffffffff80016d58 <+16>:	ecall
>    0xffffffff80016d5c <+20>:	ld	ra,8(sp)
>    0xffffffff80016d5e <+22>:	ld	s0,0(sp)
>    0xffffffff80016d60 <+24>:	addi	sp,sp,16
>    0xffffffff80016d62 <+26>:	ret
>
> [ Removed previous disassembly with tracepoints. ]
>                                                                I'll try
> again with GCC 15.1, and get back if it actually improves the situation.

GCC 15.1 still leaves "mv" outside the branch, but at least seems to be
on the right track (undesired overhead is marked with leading stars):

   0xffffffff800236e8 <+0>:	addi	sp,sp,-48
   0xffffffff800236ea <+2>:	sd	s0,32(sp)
   0xffffffff800236ec <+4>:	sd	ra,40(sp)
   0xffffffff800236ee <+6>:	addi	s0,sp,48
*  0xffffffff800236f0 <+8>:	mv	a4,a0
*  0xffffffff800236f2 <+10>:	mv	a5,a1
   0xffffffff800236f4 <+12>:	nop
*  0xffffffff800236f8 <+16>:	mv	a0,a4
*  0xffffffff800236fa <+18>:	mv	a1,a5
   0xffffffff800236fc <+20>:	li	a7,123
   0xffffffff80023700 <+24>:	li	a6,456
   0xffffffff80023704 <+28>:	ecall
*  0xffffffff80023708 <+32>:	mv	a5,a0
*  0xffffffff8002370a <+34>:	mv	a2,a1
   0xffffffff8002370c <+36>:	nop
   0xffffffff80023710 <+40>:	ld	ra,40(sp)
   0xffffffff80023712 <+42>:	ld	s0,32(sp)
*  0xffffffff80023714 <+44>:	mv	a0,a5
*  0xffffffff80023716 <+46>:	mv	a1,a2
   0xffffffff80023718 <+48>:	addi	sp,sp,48
   0xffffffff8002371a <+50>:	ret
   [Tracing goes to +126]

I realized I had the environment configured for clang in the last mail,
so here is actual GCC 14.3, which also spills in the prologue:

   0xffffffff80023360 <+0>:	addi	sp,sp,-48
   0xffffffff80023362 <+2>:	sd	s0,32(sp)
*  0xffffffff80023364 <+4>:	sd	s1,24(sp)
*  0xffffffff80023366 <+6>:	sd	s2,16(sp)
   0xffffffff80023368 <+8>:	sd	ra,40(sp)
   0xffffffff8002336a <+10>:	addi	s0,sp,48
*  0xffffffff8002336c <+12>:	mv	s2,a0
*  0xffffffff8002336e <+14>:	mv	s1,a1
   0xffffffff80023370 <+16>:	nop
*  0xffffffff80023374 <+20>:	mv	a0,s2
*  0xffffffff80023376 <+22>:	mv	a1,s1
   0xffffffff80023378 <+24>:	li	a7,123
   0xffffffff8002337c <+28>:	li	a6,456
   0xffffffff80023380 <+32>:	ecall
*  0xffffffff80023384 <+36>:	mv	s2,a0
*  0xffffffff80023386 <+38>:	mv	s1,a1
   0xffffffff80023388 <+40>:	nop
   0xffffffff8002338c <+44>:	ld	ra,40(sp)
   0xffffffff8002338e <+46>:	ld	s0,32(sp)
*  0xffffffff80023390 <+48>:	mv	a0,s2
*  0xffffffff80023392 <+50>:	mv	a1,s1
*  0xffffffff80023394 <+52>:	ld	s2,16(sp)
*  0xffffffff80023396 <+54>:	ld	s1,24(sp)
   0xffffffff80023398 <+56>:	addi	sp,sp,48
   0xffffffff8002339a <+58>:	ret
   [Tracing goes to +108]

And clang in the last mail inlined the tracepoints, because I put the
example function in sbi_ecall.c, which bloated the tracing slowpath, and
spilled one more register than needed.
With the function in sbi.c, to better simulate actual use (gcc examples
are already doing this), clang 20.1.6 and 19.1.7 do:

   0xffffffff80016f08 <+0>:	addi	sp,sp,-32
   0xffffffff80016f0a <+2>:	sd	ra,24(sp)
   0xffffffff80016f0c <+4>:	sd	s0,16(sp)
*  0xffffffff80016f0e <+6>:	sd	s1,8(sp)
*  0xffffffff80016f10 <+8>:	sd	s2,0(sp)
   0xffffffff80016f12 <+10>:	addi	s0,sp,32
   0xffffffff80016f14 <+12>:	nop
   0xffffffff80016f18 <+16>:	li	a7,123
   0xffffffff80016f1c <+20>:	li	a6,456
   0xffffffff80016f20 <+24>:	ecall
   0xffffffff80016f24 <+28>:	nop
   0xffffffff80016f28 <+32>:	ld	ra,24(sp)
   0xffffffff80016f2a <+34>:	ld	s0,16(sp)
*  0xffffffff80016f2c <+36>:	ld	s1,8(sp)
*  0xffffffff80016f2e <+38>:	ld	s2,0(sp)
   0xffffffff80016f30 <+40>:	addi	sp,sp,32
   0xffffffff80016f32 <+42>:	ret
   [Tracing goes to +94]

When compared to GCC 15.1, clang spills in the prologue, but doesn't
store around the static branch sites.  The optimal result would be a
combination of what clang and GCC 15.1 do (code without any stars).

When I looked at real code samples, the behavior was roughly similar.
GCC just wasn't always placing the "mv"s as obviously.



More information about the linux-riscv mailing list