[PATCH v3] kprobe/ftrace: bail out if ftrace was killed

Tue May 7 11:36:22 PDT 2024

Christophe Leroy <christophe.leroy at csgroup.eu> writes:
> Le 01/05/2024 à 18:29, Stephen Brennan a écrit :
>> If an error happens in ftrace, ftrace_kill() will prevent disarming
>> kprobes. Eventually, the ftrace_ops associated with the kprobes will be
>> freed, yet the kprobes will still be active, and when triggered, they
>> will use the freed memory, likely resulting in a page fault and panic.
>> 
>> This behavior can be reproduced quite easily, by creating a kprobe and
>> then triggering a ftrace_kill(). For simplicity, we can simulate an
>> ftrace error with a kernel module like [1]:
>> 
>> [1]: https://github.com/brenns10/kernel_stuff/tree/master/ftrace_killer
>> 
>>    sudo perf probe --add commit_creds
>>    sudo perf trace -e probe:commit_creds
>>    # In another terminal
>>    make
>>    sudo insmod ftrace_killer.ko  # calls ftrace_kill(), simulating bug
>>    # Back to perf terminal
>>    # ctrl-c
>>    sudo perf probe --del commit_creds
>> 
>> After a short period, a page fault and panic would occur as the kprobe
>> continues to execute and uses the freed ftrace_ops. While ftrace_kill()
>> is supposed to be used only in extreme circumstances, it is invoked in
>> FTRACE_WARN_ON() and so there are many places where an unexpected bug
>> could be triggered, yet the system may continue operating, possibly
>> without the administrator noticing. If ftrace_kill() does not panic the
>> system, then we should do everything we can to continue operating,
>> rather than leave a ticking time bomb.
>> 
>> Signed-off-by: Stephen Brennan <stephen.s.brennan at oracle.com>
>> ---
>> Changes in v3:
>>    Don't expose ftrace_is_dead(). Create a "kprobe_ftrace_disabled"
>>    variable and check it directly in the kprobe handlers.
>
> Isn't it safer to provide a fonction rather than a direct access to a 
> variable ?

Is the concern that other code could modify this variable? If so, then I
suppose the function call is safer. But the variable is not exported and
I think built-in code can be trusted not to muck with it. Maybe I'm
missing your point about safety though?

> By the way, wouldn't it be more performant to use a static branch (jump 
> label) ?

I agree with Steven's concern that text modification would unfortunately
not be a good way to handle an error in text modification. Especially, I
believe there could be deadlock risks, as static key enablement requires
taking the text_mutex and the jump_label_mutex. I'd be concerned that
the text_mutex could already be held in some situations where
ftrace_kill() is called. But I'm not certain about that.

Thanks for taking a look!
Stephen