[PATCH] riscv: Fix text patching when icache flushes use IPIs

Thu Feb 8 05:23:02 PST 2024

Hi Andrea,

On Thu, Feb 8, 2024 at 12:42 PM Andrea Parri <parri.andrea at gmail.com> wrote:
>
> > +static int __ftrace_modify_code(void *data)
> > +{
> > +     struct ftrace_modify_param *param = data;
> > +
> > +     if (atomic_inc_return(&param->cpu_count) == num_online_cpus()) {
> > +             ftrace_modify_all_code(param->command);
> > +             atomic_inc(&param->cpu_count);
>
> I stared at ftrace_modify_all_code() for a bit but honestly I don't see
> what prevents the ->cpu_count increment from being reordered before the
> insn write(s) (architecturally) now that you have removed the IPI dance:
> perhaps add an smp_wmb() right before the atomic_inc() (or promote this
> latter to a (void)atomic_inc_return_release()) and/or an inline comment
> saying why such reordering is not possible?

I did not even think of that, and it actually makes sense so I'll go
with what you propose: I'll replace atomic_inc() with
atomic_inc_return_release(). And I'll add the following comment if
that's ok with you:

"Make sure the patching store is effective *before* we increment the
counter which releases all waiting cpus"

>
>
> > +     } else {
> > +             while (atomic_read(&param->cpu_count) <= num_online_cpus())
> > +                     cpu_relax();
> > +             smp_mb();
>
> I see that you've lifted/copied the memory barrier from patch_text_cb():
> what's its point?  AFAIU, the barrier has no ordering effect on program
> order later insn fetches; perhaps the code was based on some legacy/old
> version of Zifencei?  IAC, comments, comments, ... or maybe just remove
> that memory barrier?

Honestly, I looked at it one minute, did not understand its purpose
and said to myself "ok that can't hurt anyway, I may be missing
something".

FWIW,  I see that arm64 uses isb() here. If you don't see its purpose,
I'll remove it (here and where I copied it).

>
>
> > +     }
> > +
> > +     local_flush_icache_all();
> > +
> > +     return 0;
> > +}
>
> [...]
>
>
> > @@ -232,8 +230,7 @@ static int patch_text_cb(void *data)
> >       if (atomic_inc_return(&patch->cpu_count) == num_online_cpus()) {
> >               for (i = 0; ret == 0 && i < patch->ninsns; i++) {
> >                       len = GET_INSN_LENGTH(patch->insns[i]);
> > -                     ret = patch_text_nosync(patch->addr + i * len,
> > -                                             &patch->insns[i], len);
> > +                     ret = patch_insn_write(patch->addr + i * len, &patch->insns[i], len);
> >               }
> >               atomic_inc(&patch->cpu_count);
> >       } else {
> > @@ -242,6 +239,8 @@ static int patch_text_cb(void *data)
> >               smp_mb();
> >       }
> >
> > +     local_flush_icache_all();
> > +
> >       return ret;
> >  }
> >  NOKPROBE_SYMBOL(patch_text_cb);
>
> My above remarks/questions also apply to this function.
>
>
> On a last topic, although somehow orthogonal to the scope of this patch,
> I'm not sure the patch_{map,unmap}() dance in our patch_insn_write() is
> correct: I can see why we may want (need to do) the local TLB flush be-
> fore returning from patch_{map,unmap}(), but does a local flush suffice?
> For comparison, arm64 seems to go through a complete dsb-tlbi-dsb(-isb)
> sequence in their unmapping stage (and apparently relying on "no caching
> of invalid ptes" in their mapping stage).  Of course, "broadcasting" our
> (riscv's) TLB invalidations will necessary introduce some complexity...
>
> Thoughts?

To avoid remote TLBI, could we simply disable the preemption before
the first patch_map()? arm64 disables the irqs, but that seems
overkill to me, but maybe I'm missing something again?

Thanks for your comments Andrea,

Alex

>
>   Andrea