[PATCH v6 6/9] seccomp: add "seccomp" syscall

Kees Cook keescook at chromium.org
Fri Jun 13 15:01:21 PDT 2014


On Fri, Jun 13, 2014 at 2:42 PM, Andy Lutomirski <luto at amacapital.net> wrote:
> On Fri, Jun 13, 2014 at 2:37 PM, Alexei Starovoitov <ast at plumgrid.com> wrote:
>> On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <luto at amacapital.net> wrote:
>>> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <ast at plumgrid.com> wrote:
>>>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <keescook at chromium.org> wrote:
>>>>> This adds the new "seccomp" syscall with both an "operation" and "flags"
>>>>> parameter for future expansion. The third argument is a pointer value,
>>>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
>>>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
>>>>>
>>>>> Signed-off-by: Kees Cook <keescook at chromium.org>
>>>>> Cc: linux-api at vger.kernel.org
>>>>> ---
>>>>>  arch/x86/syscalls/syscall_32.tbl  |    1 +
>>>>>  arch/x86/syscalls/syscall_64.tbl  |    1 +
>>>>>  include/linux/syscalls.h          |    2 ++
>>>>>  include/uapi/asm-generic/unistd.h |    4 ++-
>>>>>  include/uapi/linux/seccomp.h      |    4 +++
>>>>>  kernel/seccomp.c                  |   63 ++++++++++++++++++++++++++++++++-----
>>>>>  kernel/sys_ni.c                   |    3 ++
>>>>>  7 files changed, 69 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
>>>>> index d6b867921612..7527eac24122 100644
>>>>> --- a/arch/x86/syscalls/syscall_32.tbl
>>>>> +++ b/arch/x86/syscalls/syscall_32.tbl
>>>>> @@ -360,3 +360,4 @@
>>>>>  351    i386    sched_setattr           sys_sched_setattr
>>>>>  352    i386    sched_getattr           sys_sched_getattr
>>>>>  353    i386    renameat2               sys_renameat2
>>>>> +354    i386    seccomp                 sys_seccomp
>>>>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>>>>> index ec255a1646d2..16272a6c12b7 100644
>>>>> --- a/arch/x86/syscalls/syscall_64.tbl
>>>>> +++ b/arch/x86/syscalls/syscall_64.tbl
>>>>> @@ -323,6 +323,7 @@
>>>>>  314    common  sched_setattr           sys_sched_setattr
>>>>>  315    common  sched_getattr           sys_sched_getattr
>>>>>  316    common  renameat2               sys_renameat2
>>>>> +317    common  seccomp                 sys_seccomp
>>>>>
>>>>>  #
>>>>>  # x32-specific system call numbers start at 512 to avoid cache impact
>>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>>> index b0881a0ed322..1713977ee26f 100644
>>>>> --- a/include/linux/syscalls.h
>>>>> +++ b/include/linux/syscalls.h
>>>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>>>>  asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
>>>>>                          unsigned long idx1, unsigned long idx2);
>>>>>  asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
>>>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>>>>> +                           const char __user *uargs);
>>>>
>>>> It looks odd to add 'flags' argument to syscall that is not even used.
>>>> It don't think it will be extensible this way.
>>>> 'uargs' is used only in 2nd command as well and it's not 'char __user *'
>>>> but rather 'struct sock_fprog __user *'
>>>> I think it makes more sense to define only first argument as 'int op' and the
>>>> rest as variable length array.
>>>> Something like:
>>>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len);
>>>> then different commands can interpret 'attrs' differently.
>>>> if op == mode_strict, then attrs == NULL, len == 0
>>>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter
>>>> and nla_data(attrs) is 'struct sock_fprog'
>>>
>>> Eww.  If the operation doesn't imply the type, then I think we've
>>> totally screwed up.
>>>
>>>> If we decide to add new types of filters or new commands, the syscall prototype
>>>> won't need to change. New commands can be added preserving backward
>>>> compatibility.
>>>> The basic TLV concept has been around forever in netlink world. imo makes
>>>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls
>>>> is the thing
>>>> of the past. TLV style is more extensible. Fields of structures can become
>>>> optional in the future, new fields added, etc.
>>>> 'struct nlattr' brings the same benefits to kernel api as protobuf did
>>>> to user land.
>>>
>>> I see no reason to bring nl_attr into this.
>>>
>>> Admittedly, I've never dealt with nl_attr, but everything
>>> netlink-related I've even been involved in has involved some sort of
>>> API atrocity.
>>
>> netlink has a lot of legacy and there is genetlink which is not pretty
>> either because of extra socket creation, binding, dealing with packet
>> loss issues, but the key concept of variable length encoding is sound.
>> Right now seccomp has two commands and they already don't fit
>> into single syscall neatly. Are you saying there should be two syscalls
>> here? What about another seccomp related command? Another syscall?
>> imo all seccomp related commands needs to be mux/demux-ed under
>> one syscall. What is the way to mux/demux potentially very different
>> commands under one syscall? I cannot think of anything better than
>> TLV style. 'struct nlattr' is what we have today and I think it works fine.
>> I'm not suggesting to bring the whole netlink into the picture, but rather
>> TLV style of encoding different arguments for different commands.
>
> I'm unconvinced.  These are simple commands, and I think the interface
> should be simple.  Syscalls are cheap.
>
> As an example, the interface could be:
>
> int seccomp_add_filter(const struct sock_fprog *filter, unsigned int flags);
>
> The "tsync" operation would be seccomp_add_filter(NULL,
> SECCOMP_ADD_FILTER_TSYNC) -- it's equivalent to adding an
> always-accept filter and syncing threads.

I think you convinced me that tsync should be part of adding a filter
(since now there are no failure side-effects), so this specific
example I would expect EFAULT from. But ...

>
> But, frankly, this kind of stuff should probably be "do operation X".
> IIUC nl_attr is more like "do something, with these tags and values",
> which results in oddities like whatever should happen of more than one
> tag is set.

I have no objection to eliminating seccomp-strict from the syscall,
and just making this the "add seccomp filter" syscall. My only
hesitation would be that if we need something besides adding a filter
in the future, we'd be back to extending this awkwardly or adding
another syscall. That's why I went with the "operation" argument. I'm
not opposed to passing attrs and len, but seccomp_add_filter does feel
cleaner.

-Kees

-- 
Kees Cook
Chrome OS Security



More information about the linux-arm-kernel mailing list