[PATCH RFC v2 riscv/for-next 0/5] Enable ftrace with kernel preemption for RISC-V

Tue Mar 19 07:50:01 PDT 2024

On 11/03/2024 15:24, Andy Chiu wrote:
> On Thu, Mar 7, 2024 at 11:57 PM Samuel Holland
> <samuel.holland at sifive.com> wrote:
>> Hi Alex,
>>
>> On 2024-03-07 7:21 AM, Alexandre Ghiti wrote:
>>> But TBH, I have started thinking about the issue your patch is trying to deal
>>> with. IIUC you're trying to avoid traps (or silent errors) that could happen
>>> because of concurrent accesses when patching is happening on a pair auipc/jarl.
>>>
>>> I'm wondering if instead, we could not actually handle the potential traps:
>>> before storing the auipc + jalr pair, we could use a well-identified trapping
>>> instruction that could be recognized in the trap handler as a legitimate trap.
>>> For example:
>>>
>>>
>>> auipc  -->  auipc  -->  XXXX  -->  XXXX  -->  auipc
>>> jalr        XXXX        XXXX       jalr       jalr
>>>
>>>
>>> If a core traps on a XXXX instruction, we know this address is being patched, so
>>> we can return and probably the patching will be over. We could also identify
>>> half patched word instruction (I mean with only XX).
>> Unfortunately this does not work without some fence.i in the middle. The
>> processor is free to fetch any instruction that has been written to a location
>> since the last fence.i instruction. So it would be perfectly valid to fetch the
>> old aiupc and new jalr or vice versa and not trap. This would happen if, for
>> example, the two instructions were in different cache lines, and only one of the
>> cache lines got evicted and refilled.
>>
>> But sending an IPI to run the fence.i probably negates the performance benefit.
> Maybe something like x86, we can hook ftrace_replace_code() out and
> batch send IPIs to prevent storms of remote fences. The solution Alex
> proposed can save the code size for function entries. But we have to
> send out remote fences at each "-->" transition, which is 4 sets of
> remote IPIs. On the other hand, this series increases the per-function
> patch size to 24 bytes. However, it decreases the number of remote
> fences to 1 set.
>
> The performance hit could be observable for the auipc + jalr case,
> because all remote cores will be executing on XXXX instructions and
> take a trap at each function entry during code patching.
>
> Besides, this series would give us a chance not to send any remote
> fences if we were to change only the destination of ftrace (e.g. to a
> custom ftrace trampoline). As it would be a regular store for the
> writer and regular load for readers, only fence w,w is needed.
> However, I am not very certain on how often would be for this
> particular use case. I'd need some time to investigate it.
>
>> Maybe there is some creative way to overcome this.
>>
>>> But please let me know if that's completely stupid and I did not understand the
>>> problem, since my patchset to support svvptc, I am wondering if it is not more
>>> performant to actually take very unlikely traps instead of trying to avoid them.
>> I agree in general it is a good idea to optimize the hot path like this.
>>
>> Regards,
>> Samuel
>>
> Regards,
> Andy
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

So indeed my solution was way too naive and we've been discussing that 
with Björn lately. He worked a lot on that and came up with the solution 
he proposed here 
https://lore.kernel.org/linux-riscv/87zfv0onre.fsf@all.your.base.are.belong.to.us/

The thing is ftrace seems to be quite broken as the ftrace kselftests 
raise a lot of issues which I have started to debug but are not that 
easy, so we are wondering if *someone* should not work on Bjorn's 
solution (or another, open to discussions) for 6.10. @Andy WDYT? Do you 
have free cycles? Björn could work on that too (and I'll help if needed).

Let me know what you think!

Alex