[PATCH] riscv/entry: get correct syscall number from syscall_get_nr()

Thomas Gleixner tglx at linutronix.de
Fri Oct 25 06:12:14 PDT 2024


On Mon, Oct 21 2024 at 09:46, Björn Töpel wrote:
> Celeste Liu <coelacanthushex at gmail.com> writes:
>> 1. syscall_enter_from_user_mode() will do two things:
>>    1) the return value is only to inform whether the syscall should be skipped.
>>    2) regs will be modified by filters (seccomp or ptrace and so on).
>> 2. for common entry user, there is two informations: syscall number and
>>    the return value of syscall_enter_from_user_mode() (called is_skipped below).
>>    so there is three situations:
>>    1) if syscall number is invalid, the syscall should not be performed, and
>>       we set a0 to -ENOSYS to inform userspace the syscall doesn't exist.
>>    2) if syscall number is valid, is_skipped will be used:
>>       a) if is_skipped is -1, which means there are some filters reject this syscall,
>>          so the syscall should not performed. (Of course, we can use bool instead to
>>          get better semantic)
>>       b) if is_skipped != -1, which means the filters approved this syscall,
>>          so we invoke syscall handler with modified regs.
>>
>> In your design, the logical condition is not obvious. Why syscall_enter_from_user_mode()
>> informed the syscall will be skipped but the syscall handler will be called
>> when syscall number is invalid? The users need to think two things to get result:
>> a) -1 means skip
>> b) -1 < 0 in signed integer, so the skip condition is always a invalid syscall number.
>>
>> In may way, the users only need to think one thing: The syscall_enter_from_user_mode()
>> said -1 means the syscall should not be performed, so use it as a condition of reject
>> directly. They just need to combine the informations that they get from API as the
>> condition of control flow.
>
> I'm all-in for simpler API usage! Maybe massage the
> syscall_enter_from_user_mode() (or a new one), so that additional
> syscall_get_nr() call is not needed?

It's completely unclear to me what the actual problem is. The flow how
this works on all architectures is:

       regs->orig_a0  = regs->a0
       regs->a0 = -ENOSYS;

       nr = syscall_enter_from_user_mode(....);

       if (nr >= 0)
          regs->a0 = nr < MAX_SYSCALL ? syscall(nr) : -ENOSYS;
                     
If syscall_trace_enter() returns -1 to skip the syscall, then regs->a0
is unmodified, unless one of the magic operations modified it.

If syscall_trace_enter() was not active (no tracer, no seccomp ...) then
regs->a0 already contains -ENOSYS.

So what's the exact problem?

Thanks,

        tglx



More information about the linux-riscv mailing list