[PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V

Mon Mar 11 07:24:21 PDT 2024

On Thu, Mar 7, 2024 at 11:57 PM Samuel Holland
<samuel.holland at sifive.com> wrote:
>
> Hi Alex,
>
> On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
> > But TBH, I have started thinking about the issue your patch is trying to deal
> > with. IIUC you're trying to avoid traps (or silent errors) that could happen
> > because of concurrent accesses when patching is happening on a pair auipc/jarl.
> >
> > I'm wondering if instead, we could not actually handle the potential traps:
> > before storing the auipc + jalr pair, we could use a well-identified trapping
> > instruction that could be recognized in the trap handler as a legitimate trap.
> > For example:
> >
> >
> > auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
> > jalr        XXXX        XXXX       jalr       jalr
> >
> >
> > If a core traps on a XXXX instruction, we know this address is being patched, so
> > we can return and probably the patching will be over. We could also identify
> > half patched word instruction (I mean with only XX).
>
> Unfortunately this does not work without some fence.i in the middle. The
> processor is free to fetch any instruction that has been written to a location
> since the last fence.i instruction. So it would be perfectly valid to fetch the
> old aiupc and new jalr or vice versa and not trap. This would happen if, for
> example, the two instructions were in different cache lines, and only one of the
> cache lines got evicted and refilled.
>
> But sending an IPI to run the fence.i probably negates the performance benefit.

Maybe something like x86, we can hook ftrace_replace_code() out and
batch send IPIs to prevent storms of remote fences. The solution Alex
proposed can save the code size for function entries. But we have to
send out remote fences at each "-->" transition, which is 4 sets of
remote IPIs. On the other hand, this series increases the per-function
patch size to 24 bytes. However, it decreases the number of remote
fences to 1 set.

The performance hit could be observable for the auipc + jalr case,
because all remote cores will be executing on XXXX instructions and
take a trap at each function entry during code patching.

Besides, this series would give us a chance not to send any remote
fences if we were to change only the destination of ftrace (e.g. to a
custom ftrace trampoline). As it would be a regular store for the
writer and regular load for readers, only fence w,w is needed.
However, I am not very certain on how often would be for this
particular use case. I'd need some time to investigate it.

>
> Maybe there is some creative way to overcome this.
>
> > But please let me know if that's completely stupid and I did not understand the
> > problem, since my patchset to support svvptc, I am wondering if it is not more
> > performant to actually take very unlikely traps instead of trying to avoid them.
>
> I agree in general it is a good idea to optimize the hot path like this.
>
> Regards,
> Samuel
>

Regards,
Andy