[PATCH] ftrace: Fixup lockdep assert held of text_mutex

Palmer Dabbelt palmer at dabbelt.com
Mon Aug 24 20:29:32 EDT 2020


On Thu, 13 Aug 2020 08:37:43 PDT (-0700), rostedt at goodmis.org wrote:
> On Wed, 12 Aug 2020 22:13:19 -0700 (PDT)
> Palmer Dabbelt <palmer at dabbelt.com> wrote:
>
>> Sorry, I'm not really sure what's going on here.  I'm not really seeing code
>> that matches this in our port right now, so maybe this is aginst some other
>> tree?  If it's the RISC-V kprobes patch set then I was hoping to take a look at
>> that tomorrow (or I guess a bit earlier this week, but I had some surprise work
>> stuff to do).  IIRC there were a handful of races in the last patch set I saw,
>> but it's been a while so I don't remember for sure.
>>
>> That said, I certainly wouldn't be surprised if there's a locking bug in our
>> ftrace stuff.  It'd be way easier for me to figure out what's going on if you
>> have a concrete suggestion as to how to fix the issues -- even if it's just a
>> workaround.
>
> The issue is actually quite basic.
>
> ftrace_init_nop() is called quite early in boot up and never called
> again. It's called before SMP is set up, so it's on a single CPU, and
> no worries about synchronization with other CPUs is needed.
>
> On x86, it is called before text_poke() is initialized (which is used
> to synchronize code updates across CPUs), and thus can't be called.
> There's a "text_poke_early()" that is used instead, which is basically
> just a memcpy().
>
> Now, if ftrace_init_nop() is not defined by the architecture, it is a
> simple call to ftrace_make_nop(), which is also used to disable ftrace
> callbacks.
>
> The issue is that we have the following path on riscv:
>
>  ftrace_init_nop()
>    ftrace_make_nop()
>      __ftrace_modify_call()
>        patch_text_nosync()
>          patch_insn_write()
>            lockdep_assert_held(&text_mutex);
>
> Boom! text_mutex is not held, and lockdep complains.
>
>
> The difference between ftrace_make_nop() being called by
> ftrace_init_nop() and being called later to disable function tracing is
> that the latter will have:
>
>
> 	ftrace_arch_code_modify_prepare();
> 	[..]
> 	ftrace_make_nop();
> 	[..]
> 	ftrace_arch_code_modify_post_process();
>
> and the former will not have those called.
>
> On x86, we handle the two different cases with:
>
>
> static int ftrace_poke_late = 0;
>
> int ftrace_arch_code_modify_prepare(void)
> {
> 	mutex_lock(&text_mutex);
> 	ftrace_poke_late = 1;
> 	return 0;
> }
>
> int ftrace_arch_code_modify_post_process(void)
> {
> 	text_poke_finish();
> 	ftrace_poke_late = 0;
> 	mutex_unlock(&text_mutex);
> }
>
> Although, the post_process() probably doesn't even need to set
> ftrace_poke_late back to zero.
>
> Then in ftrace_make_nop(), we have:
>
>   ftrace_make_nop()
>     ftrace_modify_code_direct()
>       if (ftrace_poke_late)
>         text_poke_queue(...); // this checks if text_mutex is held
>       else
>         text_poke_early(...); // is basically just memcpy, no test on text_mutex.
>
> The two solutions for riscv, is either to implement the same thing as
> above, or you can create your own ftrace_init_nop() to take the
> text_mutex before calling ftrace_make_nop(), and that should work too.

Ya, thanks, that's a pretty straight-forward issue.  I've To'd you on a patch,
but it's essentially just exactly what you suggested so I doubt it's that
interesting.

I pointed out in the patch notes that it seems reasonable to have the generic
code handle this case, would you be opposed to doing it that way?



More information about the linux-riscv mailing list