RISC-V uprobe bug (Was: Re: WARNING: CPU: 3 PID: 261 at kernel/bpf/memalloc.c:342)

Björn Töpel bjorn at kernel.org
Sat Aug 26 06:44:48 PDT 2023


Björn Töpel <bjorn at kernel.org> writes:

> I'm chasing a workqueue hang on RISC-V/qemu (TCG), using the bpf
> selftests on bpf-next 9e3b47abeb8f.
>
> I'm able to reproduce the hang by multiple runs of:
>  | ./test_progs -a link_api -a linked_list
> I'm currently investigating that.

+Guo for uprobe

This was an interesting bug. The hang is an ebreak (RISC-V breakpoint),
that puts the kernel into an infinite loop.

To reproduce, simply run the BPF selftest:
./test_progs -v -a link_api -a linked_list

First the link_api test is being run, which exercises the uprobe
functionality. The link_api test completes, and test_progs will still
have the uprobe active/enabled. Next the linked_list test triggered a
WARN_ON (which is implemented via ebreak as well).

Now, handle_break() is entered, and the uprobe_breakpoint_handler()
returns true exiting the handle_break(), which returns to the WARN
ebreak, and we have merry-go-round.

Lucky for the RISC-V folks, the BPF memory handler had a WARN that
surfaced the bug! ;-)

This patch fixes the issue, but it's probably a prettier variant:
--8<--
diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index f798c853bede..1198cb879d2f 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -248,23 +248,29 @@ static inline unsigned long get_break_insn_length(unsigned long pc)
 
 void handle_break(struct pt_regs *regs)
 {
+       bool user = user_mode(regs);
+
 #ifdef CONFIG_KPROBES
-       if (kprobe_single_step_handler(regs))
-               return;
+       if (!user) {
+               if (kprobe_single_step_handler(regs))
+                       return;
 
-       if (kprobe_breakpoint_handler(regs))
-               return;
+               if (kprobe_breakpoint_handler(regs))
+                       return;
+       }
 #endif
 #ifdef CONFIG_UPROBES
-       if (uprobe_single_step_handler(regs))
-               return;
+       if (user) {
+               if (uprobe_single_step_handler(regs))
+                       return;
 
-       if (uprobe_breakpoint_handler(regs))
-               return;
+               if (uprobe_breakpoint_handler(regs))
+                       return;
+       }
 #endif
        current->thread.bad_cause = regs->cause;
 
-       if (user_mode(regs))
+       if (user)
                force_sig_fault(SIGTRAP, TRAP_BRKPT, (void __user *)regs->epc);
 #ifdef CONFIG_KGDB
        else if (notify_die(DIE_TRAP, "EBREAK", regs, 0, regs->cause, SIGTRAP)
--8<--

I'll cook a cleaner/proper patch for this, unless the uprobes folks has
a better solution.


Björn



More information about the linux-riscv mailing list