[PATCH -next V7 0/7] riscv: Optimize function trace

Guo Ren guoren at kernel.org
Wed Feb 8 17:51:03 PST 2023


On Thu, Feb 9, 2023 at 6:29 AM David Laight <David.Laight at aculab.com> wrote:
>
> > >   # Note: aligned to 8 bytes
> > >   addr-08               // Literal (first 32-bits)      // patched to ops ptr
> > >   addr-04               // Literal (last 32-bits)       // patched to ops ptr
> > >   addr+00       func:   mv      t0, ra
> > We needn't "mv t0, ra" here because our "jalr" could work with t0 and
> > won't affect ra. Let's do it in the trampoline code, and then we can
> > save another word here.
> > >   addr+04               auipc   t1, ftrace_caller
> > >   addr+08               jalr    ftrace_caller(t1)
>
> Is that some kind of 'load high' and 'add offset' pair?
Yes.

> I guess 64bit kernels guarantee to put all module code
> within +-2G of the main kernel?
Yes, 32-bit is enough. So we only need one 32-bit literal size for the
current rv64, just like CONFIG_32BIT.

>
> > Here is the call-site:
> >    # Note: aligned to 8 bytes
> >    addr-08               // Literal (first 32-bits)      // patched to ops ptr
> >    addr-04               // Literal (last 32-bits)       // patched to ops ptr
> >    addr+00               auipc   t0, ftrace_caller
> >    addr+04               jalr    ftrace_caller(t0)
>
> Could you even do something like:
>         addr-n  call ftrace-function
>         addr-n+x        literals
>         addr+0  nop or jmp addr-n
>         addr+4  function_code
Yours cost one more instruction, right?
         addr-12  auipc
         addr-8    jalr
         addr-4    // Literal (32-bits)
         addr+0   nop or jmp addr-n // one more?
         addr+4   function_code

> So that all the code executed when tracing is enabled
> is before the label and only one 'nop' is in the body.
> The called code can use the return address to find the
> literals and then modify it to return to addr+4.
> The code cost when trace is enabled is probably irrelevant
> here - dominated by what happens later.
> It probably isn't even worth aligning a 64bit constant.
> Doing two reads probably won't be noticable.
>
> What you do want to ensure is that the initial patch is
> overwriting nop - just in case the gap isn't there.
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)



-- 
Best Regards
 Guo Ren



More information about the linux-riscv mailing list