[RFC] riscv/entry: issue about a0/orig_a0 register and ENOSYS

Alexandre Ghiti alex at ghiti.fr
Wed Oct 16 05:00:03 PDT 2024


Hi Celeste,

Thank you for looking into this and really sorry about the late response.

On 17/09/2024 06:09, Celeste Liu wrote:
> Before PTRACE_GET_SYSCALL_INFO was implemented in v5.3, the only way to
> get syscall arguments was to get user_regs_struct via PTRACE_GETREGSET.
> On some architectures where a register is used as both the first
> argument and the return value and thus will be changed at some stage of
> the syscall process, something like orig_a0 is provided to save the
> original argument register value. But RISC-V doesn't export orig_a0 in
> user_regs_struct (This ABI was designed at e2c0cdfba7f6 ("RISC-V:
> User-facing API").) so userspace application like strace will get the
> right or wrong result depends on the operation order in do_trap_ecall_u()
> function.
>
> This requires we put the ENOSYS process after syscall_enter_from_user_mode()
> or syscall_handler()[1]. Unfortunately, the generic entry API
> syscall_enter_from_user_mode() requires we
>
> *  process ENOSYS before syscall_enter_from_user_mode()


Where does this requirement come from?


> *  or only set a0 to ENOSYS when the return value of
>     syscall_enter_from_user_mode() != -1
>
> Again, if we choose the latter way to avoid conflict with the first
> issue, we will meet the third problem: strace depends on that kernel
> will return ENOSYS when syscall nr is -1 to implement their syscall
> tampering feature[2].


IIUC, seccomp and others in syscall_enter_from_user_mode() could return 
-1 and then we could not differentiate with the syscall number being -1.

But could we imagine, to distinguish between an error and the syscall 
number being -1, checking again the syscall number after we call 
syscall_enter_from_user_mode()? If the syscall number is -1, then we set 
ENOSYS otherwise we don't do anything (a bit like what you did in 
52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")).

Let me know if I completely misunderstood here!

Thanks again for the thorough explanation,

Alex


>
> Actually, we tried the both ways in 52449c17bdd1 ("riscv: entry: set
> a0 = -ENOSYS only when syscall != -1") and 61119394631f ("riscv: entry:
> always initialize regs->a0 to -ENOSYS") before.
>
> Naturally, there is a solution:
>
> 1. Just add orig_a0 in user_regs_struct and let strace use it as
>     loongarch does. So only two problems, which can be resolved without
>     conflict, are left here.
>
> The conflicts are the direct result of the limitation of generic entry
> API, so we have another two solutions:
>
> 2. Give up the generic entry API, and switch back to the
>     architecture-specific standardalone implementation.
> 3. Redesign the generic entry API: the problem was caused by
>     syscall_enter_from_user_mode() using the value -1 (which is unused
>     normally) of syscall nr to inform syscall was reject by seccomp/bpf.
>
> In theory, the Solution 1 is best:
>
> *  a0 was used for two purposes in ABI, so using two variables to store
>     it is natural.
> *  Userspace can implement features without depending on the internal
>     behavior of the kernel.
>
> Unfortunately, it's difficult to implement based on the current code.
> The RISC-V defined struct pt_regs as below:
>
>          struct pt_regs {
>                  unsigned long epc;
>          ...
>                  unsigned long t6;
>                  /* Supervisor/Machine CSRs */
>                  unsigned long status;
>                  unsigned long badaddr;
>                  unsigned long cause;
>                  /* a0 value before the syscall */
>                  unsigned long orig_a0;
>          };
>
> And user_regs_struct needs to be a prefix of struct pt_regs, so if we
> want to include orig_a0 in user_regs_struct, we will need to include
> Supervisor/Machine CSRs as well. It's not a big problem. Since
> struct pt_regs is the internal ABI of the kernel, we can reorder it.
> Unfortunately, struct user_regs_struct is defined as below:
>
>          struct user_regs_struct {
>                  unsigned long pc;
>          ...
>                  unsigned long t6;
>          };
>
> It doesn't contain something like reserved[] as padding to leave the
> space to add more registers from struct pt_regs!
> The loongarch do the right thing as below:
>
>          struct user_pt_regs {
>                  /* Main processor registers. */
>                  unsigned long regs[32];
>          ...
>                  unsigned long reserved[10];
>          } __attribute__((aligned(8)));
>
> RISC-V can't include orig_a0 in user_regs_struct without breaking UABI.
>
> Need a discussion to decide to use which solution, or is there any
> other better solution?
>
> [1]: https://github.com/strace/strace/issues/315
> [2]: https://lore.kernel.org/linux-riscv/20240627071422.GA2626@altlinux.org/
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



More information about the linux-riscv mailing list