[PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
Dmitry V. Levin
ldv at strace.io
Thu Jun 27 00:14:22 PDT 2024
Hi,
On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
> When we test seccomp with 6.4 kernel, we found errno has wrong value.
> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
> entry: Save a0 prior syscall_enter_from_user_mode()").
>
> After analysing code, we think that regs->a0 = -ENOSYS should only be
> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
> this syscall with specified errno, they will set a0 to return number as
> syscall ABI, and then return -1. This return number is finally pass as
> return number of syscall_enter_from_user_mode, and then is compared with
> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
> is always executed. It covered a0 set by seccomp, so we always get
> ENOSYS when match seccomp RET_ERRNO rule.
>
> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
> Reported-by: Felix Yan <felixonmars at archlinux.org>
> Co-developed-by: Ruizhe Pan <c141028 at gmail.com>
> Signed-off-by: Ruizhe Pan <c141028 at gmail.com>
> Co-developed-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
> Signed-off-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
> Signed-off-by: Celeste Liu <CoelacanthusHex at gmail.com>
> Tested-by: Felix Yan <felixonmars at archlinux.org>
> Tested-by: Emil Renner Berthing <emil.renner.berthing at canonical.com>
> Reviewed-by: Björn Töpel <bjorn at rivosinc.com>
> Reviewed-by: Guo Ren <guoren at kernel.org>
> ---
>
> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing at canonical.com>
> v3 -> v4: use long instead of ulong to reduce type cast and avoid
> implementation-defined behavior, and make the judgment of syscall
> invalid more explicit
> v2 -> v3: use if-statement instead of set default value,
> clarify the type of syscall
> v1 -> v2: added explanation on why always got ENOSYS
>
> arch/riscv/kernel/traps.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index f910dfccbf5d2..729f79c97e2bf 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
> {
> if (user_mode(regs)) {
> - ulong syscall = regs->a7;
> + long syscall = regs->a7;
>
> regs->epc += 4;
> regs->orig_a0 = regs->a0;
> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>
> syscall = syscall_enter_from_user_mode(regs, syscall);
>
> - if (syscall < NR_syscalls)
> + if (syscall >= 0 && syscall < NR_syscalls)
> syscall_handler(regs, syscall);
> - else
> + else if (syscall != -1)
> regs->a0 = -ENOSYS;
>
> syscall_exit_to_user_mode(regs);
Unfortunately, this change introduced a regression: it broke strace
syscall tampering on riscv. When the tracer changes syscall number to -1,
the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
return the error code of the failed syscall to userspace.
I wish you were running strace test suite before changing this part of the
kernel. Now I'm going to apply a workaround [1] in strace, but please
note that riscv seems to be the only linux architecture where such a
workaround is currently required.
There was a similar kernel bug once on parisc, but it was fixed [2]
several years ago by commit b7dc5a071ddf.
[1] https://github.com/strace/strace/commit/c3ae2b27732952663a3600269884e363cb77a024
[2] https://git.kernel.org/torvalds/c/b7dc5a071ddf69c0350396b203cba32fe5bab510
--
ldv
More information about the linux-riscv
mailing list