[PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1
Celeste Liu
coelacanthushex at gmail.com
Thu Jun 27 00:47:47 PDT 2024
On 2024-06-27 15:14, Dmitry V. Levin wrote:
> Hi,
>
> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>
>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>> this syscall with specified errno, they will set a0 to return number as
>> syscall ABI, and then return -1. This return number is finally pass as
>> return number of syscall_enter_from_user_mode, and then is compared with
>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>> is always executed. It covered a0 set by seccomp, so we always get
>> ENOSYS when match seccomp RET_ERRNO rule.
>>
>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>> Reported-by: Felix Yan <felixonmars at archlinux.org>
>> Co-developed-by: Ruizhe Pan <c141028 at gmail.com>
>> Signed-off-by: Ruizhe Pan <c141028 at gmail.com>
>> Co-developed-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
>> Signed-off-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
>> Signed-off-by: Celeste Liu <CoelacanthusHex at gmail.com>
>> Tested-by: Felix Yan <felixonmars at archlinux.org>
>> Tested-by: Emil Renner Berthing <emil.renner.berthing at canonical.com>
>> Reviewed-by: Björn Töpel <bjorn at rivosinc.com>
>> Reviewed-by: Guo Ren <guoren at kernel.org>
>> ---
>>
>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing at canonical.com>
>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>> implementation-defined behavior, and make the judgment of syscall
>> invalid more explicit
>> v2 -> v3: use if-statement instead of set default value,
>> clarify the type of syscall
>> v1 -> v2: added explanation on why always got ENOSYS
>>
>> arch/riscv/kernel/traps.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>> index f910dfccbf5d2..729f79c97e2bf 100644
>> --- a/arch/riscv/kernel/traps.c
>> +++ b/arch/riscv/kernel/traps.c
>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>> asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>> {
>> if (user_mode(regs)) {
>> - ulong syscall = regs->a7;
>> + long syscall = regs->a7;
>>
>> regs->epc += 4;
>> regs->orig_a0 = regs->a0;
>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>
>> syscall = syscall_enter_from_user_mode(regs, syscall);
>>
>> - if (syscall < NR_syscalls)
>> + if (syscall >= 0 && syscall < NR_syscalls)
>> syscall_handler(regs, syscall);
>> - else
>> + else if (syscall != -1)
>> regs->a0 = -ENOSYS;
>>
>> syscall_exit_to_user_mode(regs);
>
> Unfortunately, this change introduced a regression: it broke strace
> syscall tampering on riscv. When the tracer changes syscall number to -1,
> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
> return the error code of the failed syscall to userspace.
In the patch v2, we actually do the right thing. But as Björn Töpel's
suggestion and we found cast long to ulong is implementation-defined
behavior in C, so we change it to current form. So revert this patch and
apply patch v2 should fix this issue. Patch v2 uses ths same way with
other architectures.
[1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
>
> I wish you were running strace test suite before changing this part of the
> kernel. Now I'm going to apply a workaround [1] in strace, but please
> note that riscv seems to be the only linux architecture where such a
> workaround is currently required.
>
> There was a similar kernel bug once on parisc, but it was fixed [2]
> several years ago by commit b7dc5a071ddf.
>
> [1] https://github.com/strace/strace/commit/c3ae2b27732952663a3600269884e363cb77a024
> [2] https://git.kernel.org/torvalds/c/b7dc5a071ddf69c0350396b203cba32fe5bab510
>
>
More information about the linux-riscv
mailing list