[PATCH v2 3/2] RISC-V: sbi: remove sbi_ecall tracepoints

Radim Krčmář rkrcmar at ventanamicro.com
Tue Jun 24 06:09:09 PDT 2025


2025-06-23T15:54:00-07:00, Palmer Dabbelt <palmer at dabbelt.com>:
> Having patch 3 of 2 is not normal.

Sorry, I wanted to distinguish it from the original series without
sending a new one, because it's quite radical proposal I don't
necessarily want to get merged.
Would "[RFC 3/2]", "[RFC 3/3]", or something else look better while
raising the same alarms?

> On Thu, 19 Jun 2025 12:03:15 PDT (-0700), rkrcmar at ventanamicro.com wrote:
> So the issue is the extra save/restore on function entry?  That's the 
> sort of think shrink wrapping is supposed to help with.  It's been 
> implemented in GCC for a while, but I'm not sure how well it's been 
> pushed on (IIRC it was just one of the SPEC workloads).

Yes, shrink wrapping could help if compilers can figure out what to do
with static_keys. It's hopefully going to sort itself out in the future.
We'd ideally have some way to tell the compiler to always keep the
tracepoints inside their branches, to make them less fragile, but that
is probably asking too much from C.

I think GCC 15.1 had some shrink-wrapping improvements, but I've only
been using 14.3 so far...

> That said, this is kind of hard to reason about.  Can you pull out a 
> smaller example?

I posted an example of the original 8 argument ecall in v1:
https://lore.kernel.org/linux-riscv/20250612145754.2126147-2-rkrcmar@ventanamicro.com/T/#m1d441ab3de3e6d6b3b8d120b923f2e2081918a98
For another example, let's have the following function:

  struct sbiret some_sbi_ecall(uintptr_t a0, uintptr_t a1)
  {
    return sbi_ecall(123, 456, a0, a1);
  }

The disassembly without tracepoints (with -fno-omit-frame-pointer):
(It could have been just "li;li;ecall;ret" without frame pointer.)

   0xffffffff80016d48 <+0>:	addi	sp,sp,-16
   0xffffffff80016d4a <+2>:	sd	ra,8(sp)
   0xffffffff80016d4c <+4>:	sd	s0,0(sp)
   0xffffffff80016d4e <+6>:	addi	s0,sp,16
   0xffffffff80016d50 <+8>:	li	a7,123
   0xffffffff80016d54 <+12>:	li	a6,456
   0xffffffff80016d58 <+16>:	ecall
   0xffffffff80016d5c <+20>:	ld	ra,8(sp)
   0xffffffff80016d5e <+22>:	ld	s0,0(sp)
   0xffffffff80016d60 <+24>:	addi	sp,sp,16
   0xffffffff80016d62 <+26>:	ret

With tracepoints, the situation is worse... the optimal outcome would
add two nops, but the actual result is:

   0xffffffff80017720 <+0>:	addi	sp,sp,-48
   0xffffffff80017722 <+2>:	sd	ra,40(sp)
   0xffffffff80017724 <+4>:	sd	s0,32(sp)
   0xffffffff80017726 <+6>:	sd	s1,24(sp)
   0xffffffff80017728 <+8>:	sd	s2,16(sp)
   0xffffffff8001772a <+10>:	sd	s3,8(sp)
   0xffffffff8001772c <+12>:	addi	s0,sp,48
   0xffffffff8001772e <+14>:	nop
   0xffffffff80017730 <+16>:	nop
   0xffffffff80017734 <+20>:	li	a7,123
   0xffffffff80017738 <+24>:	li	a6,456
   0xffffffff8001773c <+28>:	ecall
   0xffffffff80017740 <+32>:	nop
   0xffffffff80017744 <+36>:	ld	ra,40(sp)
   0xffffffff80017746 <+38>:	ld	s0,32(sp)
   0xffffffff80017748 <+40>:	ld	s1,24(sp)
   0xffffffff8001774a <+42>:	ld	s2,16(sp)
   0xffffffff8001774c <+44>:	ld	s3,8(sp)
   0xffffffff8001774e <+46>:	addi	sp,sp,48
   0xffffffff80017750 <+48>:	ret
   [Tracing slowpath continues to 202.]

i.e. we spill 3 extra registers, which is at least better v1.  I'll try
again with GCC 15.1, and get back if it actually improves the situation.



More information about the linux-riscv mailing list