RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342)

Sun Aug 27 01:11:25 PDT 2023

Nam,

Nam Cao <namcaov at gmail.com> writes:

> On Sat, Aug 26, 2023 at 03:44:48PM +0200, Björn Töpel wrote:
>> Björn Töpel <bjorn at kernel.org> writes:
>> 
>> > I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
>> > selftests on bpf-next 9e3b47abeb8f.
>> >
>> > I'm able to reproduce the hang by multiple runs of:
>> >  | ./test_progs -a link_api -a linked_list
>> > I'm currently investigating that.
>> 
>> +Guo for uprobe
>> 
>> This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
>> that puts the kernel into an infinite loop.
>> 
>> To reproduce, simply run the BPF selftest:
>> ./test_progs -v -a link_api -a linked_list
>> 
>> First the link_api test is being run, which exercises the uprobe
>> functionality. The link_api test completes, and test_progs will still
>> have the uprobe active/enabled. Next the linked_list test triggered a
>> WARN_ON (which is implemented via ebreak as well).
>> 
>> Now, handle_break() is entered, and the uprobe_breakpoint_handler()
>> returns true exiting the handle_break(), which returns to the WARN
>> ebreak, and we have merry-go-round.
>> 
>> Lucky for the RISC-V folks, the BPF memory handler had a WARN that
>> surfaced the bug! ;-)
>
> Thanks for the analysis.
>
> I couldn't reproduce the problem, so I am just taking a guess here. The problem
> is bebcause uprobes didn't find a probe point at that ebreak instruction. However,
> it also doesn't think a ebreak instruction is there, then it got confused and just
> return back to the ebreak instruction, then everything repeats.
>
> The reason why uprobes didn't think there is a ebreak instruction is because
> is_trap_insn() only returns true if it is a 32-bit ebreak, or 16-bit c.ebreak if
> C extension is available, not both. So a 32-bit ebreak is not correctly recognized
> as a trap instruction.
>
> If my guess is correct, the following should fix it. Can you please try if it works?
>
> (this is the first time I send a patch this way, so please let me know if you can't apply)
>
> Best regards,
> Nam
>
> ---
>  arch/riscv/kernel/probes/uprobes.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/arch/riscv/kernel/probes/uprobes.c b/arch/riscv/kernel/probes/uprobes.c
> index 194f166b2cc4..91f4ce101cd1 100644
> --- a/arch/riscv/kernel/probes/uprobes.c
> +++ b/arch/riscv/kernel/probes/uprobes.c
> @@ -3,6 +3,7 @@
>  #include <linux/highmem.h>
>  #include <linux/ptrace.h>
>  #include <linux/uprobes.h>
> +#include <asm/insn.h>
>  
>  #include "decode-insn.h"
>  
> @@ -17,6 +18,15 @@ bool is_swbp_insn(uprobe_opcode_t *insn)
>  #endif
>  }
>  
> +bool is_trap_insn(uprobe_opcode_t *insn)
> +{
> +#ifdef CONFIG_RISCV_ISA_C
> +	if (riscv_insn_is_c_ebreak(*insn))
> +		return true;
> +#endif
> +	return riscv_insn_is_ebreak(*insn);
> +}
> +
>  unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
>  {
>  	return instruction_pointer(regs);
> -- 
> 2.34.1

The default implementation of is_trap_insn() which RISC-V is using calls
is_swbp_insn(), which is doing what your patch does. Your patch does not
address the issue.

We're taking an ebreak trap from kernel space. In this case we should
never look for a userland (uprobe) handler at all, only the kprobe
handlers should be considered.

In this case, the TIF_UPROBE is incorrectly set, and incorrectly (not)
handled in the "common entry" exit path, which takes us to the infinite
loop.

Björn