[PATCH v5] riscv: entry: set a0 = -ENOSYS only when syscall != -1

Celeste Liu coelacanthushex at gmail.com
Thu Jun 27 03:11:20 PDT 2024


On 2024-06-27 17:43, Björn Töpel wrote:
> On Thu, Jun 27, 2024 at 9:47 AM Celeste Liu <coelacanthushex at gmail.com> wrote:
>>
>> On 2024-06-27 15:14, Dmitry V. Levin wrote:
>>
>>> Hi,
>>>
>>> On Tue, Aug 01, 2023 at 10:15:16PM +0800, Celeste Liu wrote:
>>>> When we test seccomp with 6.4 kernel, we found errno has wrong value.
>>>> If we deny NETLINK_AUDIT with EAFNOSUPPORT, after f0bddf50586d, we will
>>>> get ENOSYS instead. We got same result with commit 9c2598d43510 ("riscv:
>>>> entry: Save a0 prior syscall_enter_from_user_mode()").
>>>>
>>>> After analysing code, we think that regs->a0 = -ENOSYS should only be
>>>> executed when syscall != -1. In __seccomp_filter, when seccomp rejected
>>>> this syscall with specified errno, they will set a0 to return number as
>>>> syscall ABI, and then return -1. This return number is finally pass as
>>>> return number of syscall_enter_from_user_mode, and then is compared with
>>>> NR_syscalls after converted to ulong (so it will be ULONG_MAX). The
>>>> condition syscall < NR_syscalls will always be false, so regs->a0 = -ENOSYS
>>>> is always executed. It covered a0 set by seccomp, so we always get
>>>> ENOSYS when match seccomp RET_ERRNO rule.
>>>>
>>>> Fixes: f0bddf50586d ("riscv: entry: Convert to generic entry")
>>>> Reported-by: Felix Yan <felixonmars at archlinux.org>
>>>> Co-developed-by: Ruizhe Pan <c141028 at gmail.com>
>>>> Signed-off-by: Ruizhe Pan <c141028 at gmail.com>
>>>> Co-developed-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
>>>> Signed-off-by: Shiqi Zhang <shiqi at isrc.iscas.ac.cn>
>>>> Signed-off-by: Celeste Liu <CoelacanthusHex at gmail.com>
>>>> Tested-by: Felix Yan <felixonmars at archlinux.org>
>>>> Tested-by: Emil Renner Berthing <emil.renner.berthing at canonical.com>
>>>> Reviewed-by: Björn Töpel <bjorn at rivosinc.com>
>>>> Reviewed-by: Guo Ren <guoren at kernel.org>
>>>> ---
>>>>
>>>> v4 -> v5: add Tested-by Emil Renner Berthing <emil.renner.berthing at canonical.com>
>>>> v3 -> v4: use long instead of ulong to reduce type cast and avoid
>>>>           implementation-defined behavior, and make the judgment of syscall
>>>>           invalid more explicit
>>>> v2 -> v3: use if-statement instead of set default value,
>>>>           clarify the type of syscall
>>>> v1 -> v2: added explanation on why always got ENOSYS
>>>>
>>>>  arch/riscv/kernel/traps.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
>>>> index f910dfccbf5d2..729f79c97e2bf 100644
>>>> --- a/arch/riscv/kernel/traps.c
>>>> +++ b/arch/riscv/kernel/traps.c
>>>> @@ -297,7 +297,7 @@ asmlinkage __visible __trap_section void do_trap_break(struct pt_regs *regs)
>>>>  asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>  {
>>>>      if (user_mode(regs)) {
>>>> -            ulong syscall = regs->a7;
>>>> +            long syscall = regs->a7;
>>>>
>>>>              regs->epc += 4;
>>>>              regs->orig_a0 = regs->a0;
>>>> @@ -306,9 +306,9 @@ asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs)
>>>>
>>>>              syscall = syscall_enter_from_user_mode(regs, syscall);
>>>>
>>>> -            if (syscall < NR_syscalls)
>>>> +            if (syscall >= 0 && syscall < NR_syscalls)
>>>>                      syscall_handler(regs, syscall);
>>>> -            else
>>>> +            else if (syscall != -1)
>>>>                      regs->a0 = -ENOSYS;
>>>>
>>>>              syscall_exit_to_user_mode(regs);
>>>
>>> Unfortunately, this change introduced a regression: it broke strace
>>> syscall tampering on riscv.  When the tracer changes syscall number to -1,
>>> the kernel fails to initialize a0 with -ENOSYS and subsequently fails to
>>> return the error code of the failed syscall to userspace.
>>
>> In the patch v2, we actually do the right thing. But as Björn Töpel's
>> suggestion and we found cast long to ulong is implementation-defined
>> behavior in C, so we change it to current form. So revert this patch and
>> apply patch v2 should fix this issue. Patch v2 uses ths same way with
>> other architectures.
>>
>> [1]: https://lore.kernel.org/all/20230718162940.226118-1-CoelacanthusHex@gmail.com/
> 
> Not reverting, but a fix to make sure that a0 is initialized to -ENOSYS, e.g.:

Oh. I just want to describe what change we need, not to say actual 'git revert'.

> 
> --8<--
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> 
>   regs->epc += 4;
>   regs->orig_a0 = regs->a0;
> + regs->a0 = -ENOSYS;
> 
>   riscv_v_vstate_discard(regs);
> 
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> 
>   if (syscall >= 0 && syscall < NR_syscalls)
>   syscall_handler(regs, syscall);
> - else if (syscall != -1)
> - regs->a0 = -ENOSYS;
> +
>   /*
>   * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
>   * so the maximum stack offset is 1k bytes (10 bits).
> --8<--

This is also what I think.

> Celeste, do you want to cook that fix properly?

Yeah. I will sent patch to mail list soon.

> 
> 
> Björn




More information about the linux-riscv mailing list