[PATCH] riscv: Fix early ftrace nop patching

Mon Jun 24 01:08:30 PDT 2024

On 20/06/2024 19:03, Alexandre Ghiti wrote:
>
> On 19/06/2024 05:40, Andy Chiu wrote:
>> On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti 
>> <alexghiti at rivosinc.com> wrote:
>>> Hi Andy,
>>>
>>> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu at sifive.com> wrote:
>>>> On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex at ghiti.fr> wrote:
>>>>> Hi Conor,
>>>>>
>>>>> On 17/06/2024 15:23, Alexandre Ghiti wrote:
>>>>>> Hi Conor,
>>>>>>
>>>>>> Sorry for the delay here.
>>>>>>
>>>>>> On 13/06/2024 09:48, Conor Dooley wrote:
>>>>>>> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
>>>>>>>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>>> converted ftrace_make_nop() to use patch_insn_write() which 
>>>>>>>> does not
>>>>>>>> emit any icache flush relying entirely on 
>>>>>>>> __ftrace_modify_code() to do
>>>>>>>> that.
>>>>>>>>
>>>>>>>> But we missed that ftrace_make_nop() was called very early 
>>>>>>>> directly
>>>>>>>> when
>>>>>>>> converting mcount calls into nops (actually on riscv it 
>>>>>>>> converts 2B
>>>>>>>> nops
>>>>>>>> emitted by the compiler into 4B nops).
>>>>>>>>
>>>>>>>> This caused crashes on multiple HW as reported by Conor and 
>>>>>>>> Björn since
>>>>>>>> the booting core could have half-patched instructions in its 
>>>>>>>> icache
>>>>>>>> which would trigger an illegal instruction trap: fix this by 
>>>>>>>> emitting a
>>>>>>>> local flush icache when early patching nops.
>>>>>>>>
>>>>>>>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>>> Signed-off-by: Alexandre Ghiti <alexghiti at rivosinc.com>
>>>>>>>> ---
>>>>>>>>    arch/riscv/include/asm/cacheflush.h | 6 ++++++
>>>>>>>>    arch/riscv/kernel/ftrace.c          | 3 +++
>>>>>>>>    2 files changed, 9 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/riscv/include/asm/cacheflush.h
>>>>>>>> b/arch/riscv/include/asm/cacheflush.h
>>>>>>>> index dd8d07146116..ce79c558a4c8 100644
>>>>>>>> --- a/arch/riscv/include/asm/cacheflush.h
>>>>>>>> +++ b/arch/riscv/include/asm/cacheflush.h
>>>>>>>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>>>>>>>>        asm volatile ("fence.i" ::: "memory");
>>>>>>>>    }
>>>>>>>>    +static inline void local_flush_icache_range(unsigned long 
>>>>>>>> start,
>>>>>>>> +                        unsigned long end)
>>>>>>>> +{
>>>>>>>> +    local_flush_icache_all();
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>    #define PG_dcache_clean PG_arch_1
>>>>>>>>      static inline void flush_dcache_folio(struct folio *folio)
>>>>>>>> diff --git a/arch/riscv/kernel/ftrace.c 
>>>>>>>> b/arch/riscv/kernel/ftrace.c
>>>>>>>> index 4f4987a6d83d..32e7c401dfb4 100644
>>>>>>>> --- a/arch/riscv/kernel/ftrace.c
>>>>>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>>>>>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
>>>>>>>> dyn_ftrace *rec)
>>>>>>>>        out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>>>>>        mutex_unlock(&text_mutex);
>>>>>>> So, turns out that this patch is not sufficient. I've seen some 
>>>>>>> more
>>>>>>> crashes, seemingly due to initialising nops on this mutex_unlock().
>>>>>>> Palmer suggested moving the if (!mod) ... into the lock, which 
>>>>>>> fixed
>>>>>>> the problem for me.
>>>>>>
>>>>>> Ok, it makes sense, I completely missed that. I'll send a fix for 
>>>>>> that
>>>>>> shortly so that it can be merged in rc5.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>
>>>>> So I digged a bit more and I'm afraid that the only way to make sure
>>>>> this issue does not happen elsewhere is to flush the icache right 
>>>>> after
>>>>> the patching. We actually can't wait to batch the icache flush since
>>>>> along the way, we may call a function that has just been patched (the
>>>>> issue that you encountered here).
>>>> Trying to provide my thoughts, please let me know if I missed
>>>> anything. I think what Conor suggested is safe for init_nop, as it
>>>> would be called only when there is only one core (booting) and at the
>>>> loading stage of kernel modules. In the first case we just have to
>>>> make sure there is no patchable entry before the core executes
>>>> fence.i. The second case is unconditionally safe because there is no
>>>> read-side of the race.
>>>>
>>>> It is a bit tricky for the generic (runtime) case of ftrace code
>>>> patching, but that is not because of the batch fence.i maintenance. As
>>>> long as there exists a patchable entry for the stopping thread, it is
>>>> possible for them to execute on a partially patched instruction. A
>>>> solution for this is again to prevent any patchable entry in the
>>>> stop_machine's stopping thread. Another solution is to apply the
>>>> atomic ftrace patching series which aims to get rid of the race.
>>> Yeah but my worry is that we can't make sure that functions called
>>> between the patching and the fence.i are not the ones that just get
>>> patched. At least today, patch_insn_write() etc should be marked as
>> IIUC, the compiler does not generate a patchable entry for
>> patch_insn_write, and so do all functions in patch.c. The entire file
>> is not compiled with patchable entry because the flag is removed in
>> riscv's Makefile. Please let me know if I misunderstand it.
>>
>>> noinstr. But even then, we call core functions that could be patchable
>>> (I went through all and it *seems* we are safe *for now*, but this is
>>> very fragile).
>> Yes, functions in the "else" clause, atomic_read() etc, are safe for
>> now. However, it is impossible to fix as long as there exists a
>> patchable entry along the way. This is the problem that we cannot
>> patch 2 instructions with a concurrent read-side. On the other hand,
>> the path of ftrace_modify_all_code may experience the batch fence.i
>> maintenance issue, and can be fixed via a local fence.i. This is
>> because the path is executed by only one core. There does not exist a
>> concurrent write-side. As a result, we just need to make sure it
>> leaves each patch-site with a local fence.i to make sure code is
>> up-to-date.
>>
>>> Then what do you think we should do to fix Conor's issue right now:
>>> should we mark the riscv specific functions as noinstr, cross fingers
>>> that the core functions are not (and don't become) patchable and wait
>>> for your atomic patching? Or should we flush as soon as possible as I
>>> proposed above (which to me fixes the issue but hurts performance)?
>> Either way works for me. IMHO, the fix should include:
>>
>> 1. move fence.i before mutex_unlock in init_nop
>> 2. do local fence.i in __ftrace_modify_call
>> 3. make sure the else clause does not happen to have a patchable entry
>
>
> OK I have just checked again and I agree with you on the whole fix :) 
> I may add some comments for 3 so that it is very clear.

And after another check, a few other callsites need to emit a sfence.i 
like ftrace_make_call() and ftrace_make_nop(), which in the end, amounts 
to adding a sfence.i right after the patching is done. So after a 
private discussion with Andy, we both agreed that the initial solution I 
posted was the safest. Admittedly, the impact on performance is unknown, 
so any input on this side would be appreciated.

@Conor: I'm sending this as a proper patch if you can give it a try so 
that we can merge that in rc6.

Thanks,

Alex

>
> Thanks Andy!
>
> Alex
>
>
>>
>> Thanks,
>> Andy
>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>>> I don't know how much it will impact the performance but I guess 
>>>>> it will.
>>>>>
>>>>> Unless someone has a better idea (I added Andy and Puranjay in 
>>>>> cc), here
>>>>> is the patch that implements this, can you give it a try? Thanks
>>>>>
>>>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>>>> index 87cbd86576b2..4b95c574fd04 100644
>>>>> --- a/arch/riscv/kernel/ftrace.c
>>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>>> @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
>>>>> dyn_ftrace *rec)
>>>>>           out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>>           mutex_unlock(&text_mutex);
>>>>>
>>>>> -       if (!mod)
>>>>> -               local_flush_icache_range(rec->ip, rec->ip +
>>>>> MCOUNT_INSN_SIZE);
>>>>> -
>>>>>           return out;
>>>>>    }
>>>>>
>>>>> @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
>>>>>           } else {
>>>>>                   while (atomic_read(&param->cpu_count) <= 
>>>>> num_online_cpus())
>>>>>                           cpu_relax();
>>>>> -       }
>>>>>
>>>>> -       local_flush_icache_all();
>>>>> +               local_flush_icache_all();
>>>>> +       }
>>>>>
>>>>>           return 0;
>>>>>    }
>>>>> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
>>>>> index 4007563fb607..ab03732d06c4 100644
>>>>> --- a/arch/riscv/kernel/patch.c
>>>>> +++ b/arch/riscv/kernel/patch.c
>>>>> @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, 
>>>>> size_t len)
>>>>>
>>>>>           memset(waddr, c, len);
>>>>>
>>>>> +       /*
>>>>> +        * We could have just patched a function that is about to be
>>>>> +        * called so make sure we don't execute partially patched
>>>>> +        * instructions by flushing the icache as soon as possible.
>>>>> +        */
>>>>> +       local_flush_icache_range((unsigned long)waddr,
>>>>> +                                (unsigned long)waddr + len);
>>>>> +
>>>>>           patch_unmap(FIX_TEXT_POKE0);
>>>>>
>>>>>           if (across_pages)
>>>>> @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
>>>>> void *insn, size_t len)
>>>>>
>>>>>           ret = copy_to_kernel_nofault(waddr, insn, len);
>>>>>
>>>>> +       /*
>>>>> +        * We could have just patched a function that is about to be
>>>>> +        * called so make sure we don't execute partially patched
>>>>> +        * instructions by flushing the icache as soon as possible.
>>>>> +        */
>>>>> +       local_flush_icache_range((unsigned long)waddr,
>>>>> +                                (unsigned long)waddr + len);
>>>>> +
>>>>>           patch_unmap(FIX_TEXT_POKE0);
>>>>>
>>>>>           if (across_pages)
>>>>> @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, 
>>>>> size_t len)
>>>>>
>>>>>           ret = patch_insn_set(tp, c, len);
>>>>>
>>>>> -       if (!ret)
>>>>> -               flush_icache_range((uintptr_t)tp, (uintptr_t)tp + 
>>>>> len);
>>>>> -
>>>>>           return ret;
>>>>>    }
>>>>>    NOKPROBE_SYMBOL(patch_text_set_nosync);
>>>>> @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void 
>>>>> *insns,
>>>>> size_t len)
>>>>>
>>>>>           ret = patch_insn_write(tp, insns, len);
>>>>>
>>>>> -       if (!ret)
>>>>> -               flush_icache_range((uintptr_t) tp, (uintptr_t) tp 
>>>>> + len);
>>>>> -
>>>>>           return ret;
>>>>>    }
>>>>>    NOKPROBE_SYMBOL(patch_text_nosync);
>>>>> @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
>>>>>           } else {
>>>>>                   while (atomic_read(&patch->cpu_count) <= 
>>>>> num_online_cpus())
>>>>>                           cpu_relax();
>>>>> -       }
>>>>>
>>>>> -       local_flush_icache_all();
>>>>> +               local_flush_icache_all();
>>>>> +       }
>>>>>
>>>>>           return ret;
>>>>>    }
>>>>>
>>>>>
>>>>>>
>>>>>>>>    +    if (!mod)
>>>>>>>> +        local_flush_icache_range(rec->ip, rec->ip + 
>>>>>>>> MCOUNT_INSN_SIZE);
>>>>>>>> +
>>>>>>>>        return out;
>>>>>>>>    }
>>>>>>>>    --
>>>>>>>> 2.39.2
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> linux-riscv mailing list
>>>>>>>> linux-riscv at lists.infradead.org
>>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>> _______________________________________________
>>>>>> linux-riscv mailing list
>>>>>> linux-riscv at lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>> Cheers,
>>>> Andy
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv