Is RISC-V Static Call worth implementing ?

Fri Sep 13 15:42:59 PDT 2024

On Fri, Sep 13, 2024 at 03:05:33AM -0500, Juhan Jin wrote:
> Hi folks,
> 
> I’m interested in implementing Static Call for RISC-V, and I want to
> know whether it is worth the efforts to implement Static Call for
> RISC-V.
> 
> In summary, Static Call is a mechanism that works similarly to global
> function pointers. A simple use case is as follows:
> 
> int func_a(int arg1, int arg2);
> // define a static call named my_name pointing to func_a
> DEFINE_STATIC_CALL(my_name, func_a); 
> // call func_a through static call my_name
> static_call(my_name)(arg1, arg2);
> // make my_name point to a new function func_b
> static_call_update(my_name, &func_b);
> // call func_b
> static_call(my_name)(arg1, arg2);
> 
> The advantage of a static call over a function pointer is that static
> calls are direct calls whereas function pointers are indirect calls.
> On x86, direct calls are much faster than indirect calls when you
> consider speculation mitigation options such as retpoline. So Static
> Call is meaningful and has already been implemented for x86.

Since retpolines are not used on riscv, I am not sure if there is a
meaningful benefit for static calls. Without retpolines, branch
predictors should be pretty good at jumping to the correct place.
The benefit of the static call minus the overhead of setting up the
static call needs to be greater than the time spent evaluating the
indirect calls.

> 
> For RISC-V, a general indirect call is like this:
> 
> auipc a5, imm
> # load the value of function pointer into a5
> ld    a5, imm(a5)
> # with the address of target function in a5, we can now jump to it
> jalr  ra, 0(a5)
> 
> There are two versions of Static Call: out-of-line and inline. Inline
> version builds on top of out-of-line version, and is faster than
> out-of-line version.
> 
> For an out-of-line static call, the static call first jumps to a
> trampoline, then jumps to the actual function. The best approach I 
> can come up with is a three-instruction trampoline. Three instructions
> plus two instructions (AUIPC JALR) to jump to the trampoline equals
> five.
> 
> Five instructions with no mem op versus three instructions with one
> mem op. Not sure which one is faster.
> 
> For an inline static call, the static call directly jumps to the
> target function. I discussed with Peter Zijlstra, one of the
> maintainers of Static Call. I guess we can use the two regular
> instructions AUIPC and JALR to jump to a target function.

The JAL instruction has a 20-bit immediate so if the function is within
PC+-20bits than the AUIPC+JALR could just be a JAL (and fall back to
AUIPC+JALR if too far).

> 
> Two instructions with no mem op versus three instructions with one mem
> op. An inline static call should be faster than an indirect call.
> 
> Does the aforementioned benefits merit a RISC-V static call implementation 
> (especially inline)? Or are the benefits so negligible that it’s simply not 
> worth the effort to do a RISC-V implementation?
> 

I am less convinced about the out-of-line case, but I think the inline
case (especially when a JAL can be used) is promising.

- Charlie

>It should be noted that updating a static call is much more troublesome
> than updating a function pointer. So static calls are suitable to
> replace function pointers that don’t change often. One scenario is
> tracepoints. With inline static calls, RISC-V tracepoint performance
> should improve. Not sure by how much, though.
>  
> Best,
> Juhan
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv