[PATCH v6 6/9] seccomp: add "seccomp" syscall

Andy Lutomirski luto at amacapital.net
Fri Jun 13 14:42:03 PDT 2014


On Fri, Jun 13, 2014 at 2:37 PM, Alexei Starovoitov <ast at plumgrid.com> wrote:
> On Fri, Jun 13, 2014 at 2:25 PM, Andy Lutomirski <luto at amacapital.net> wrote:
>> On Fri, Jun 13, 2014 at 2:22 PM, Alexei Starovoitov <ast at plumgrid.com> wrote:
>>> On Tue, Jun 10, 2014 at 8:25 PM, Kees Cook <keescook at chromium.org> wrote:
>>>> This adds the new "seccomp" syscall with both an "operation" and "flags"
>>>> parameter for future expansion. The third argument is a pointer value,
>>>> used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
>>>> be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
>>>>
>>>> Signed-off-by: Kees Cook <keescook at chromium.org>
>>>> Cc: linux-api at vger.kernel.org
>>>> ---
>>>>  arch/x86/syscalls/syscall_32.tbl  |    1 +
>>>>  arch/x86/syscalls/syscall_64.tbl  |    1 +
>>>>  include/linux/syscalls.h          |    2 ++
>>>>  include/uapi/asm-generic/unistd.h |    4 ++-
>>>>  include/uapi/linux/seccomp.h      |    4 +++
>>>>  kernel/seccomp.c                  |   63 ++++++++++++++++++++++++++++++++-----
>>>>  kernel/sys_ni.c                   |    3 ++
>>>>  7 files changed, 69 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
>>>> index d6b867921612..7527eac24122 100644
>>>> --- a/arch/x86/syscalls/syscall_32.tbl
>>>> +++ b/arch/x86/syscalls/syscall_32.tbl
>>>> @@ -360,3 +360,4 @@
>>>>  351    i386    sched_setattr           sys_sched_setattr
>>>>  352    i386    sched_getattr           sys_sched_getattr
>>>>  353    i386    renameat2               sys_renameat2
>>>> +354    i386    seccomp                 sys_seccomp
>>>> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
>>>> index ec255a1646d2..16272a6c12b7 100644
>>>> --- a/arch/x86/syscalls/syscall_64.tbl
>>>> +++ b/arch/x86/syscalls/syscall_64.tbl
>>>> @@ -323,6 +323,7 @@
>>>>  314    common  sched_setattr           sys_sched_setattr
>>>>  315    common  sched_getattr           sys_sched_getattr
>>>>  316    common  renameat2               sys_renameat2
>>>> +317    common  seccomp                 sys_seccomp
>>>>
>>>>  #
>>>>  # x32-specific system call numbers start at 512 to avoid cache impact
>>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>>> index b0881a0ed322..1713977ee26f 100644
>>>> --- a/include/linux/syscalls.h
>>>> +++ b/include/linux/syscalls.h
>>>> @@ -866,4 +866,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
>>>>  asmlinkage long sys_kcmp(pid_t pid1, pid_t pid2, int type,
>>>>                          unsigned long idx1, unsigned long idx2);
>>>>  asmlinkage long sys_finit_module(int fd, const char __user *uargs, int flags);
>>>> +asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
>>>> +                           const char __user *uargs);
>>>
>>> It looks odd to add 'flags' argument to syscall that is not even used.
>>> It don't think it will be extensible this way.
>>> 'uargs' is used only in 2nd command as well and it's not 'char __user *'
>>> but rather 'struct sock_fprog __user *'
>>> I think it makes more sense to define only first argument as 'int op' and the
>>> rest as variable length array.
>>> Something like:
>>> long sys_seccomp(unsigned int op, struct nlattr *attrs, int len);
>>> then different commands can interpret 'attrs' differently.
>>> if op == mode_strict, then attrs == NULL, len == 0
>>> if op == mode_filter, then attrs->nla_type == seccomp_bpf_filter
>>> and nla_data(attrs) is 'struct sock_fprog'
>>
>> Eww.  If the operation doesn't imply the type, then I think we've
>> totally screwed up.
>>
>>> If we decide to add new types of filters or new commands, the syscall prototype
>>> won't need to change. New commands can be added preserving backward
>>> compatibility.
>>> The basic TLV concept has been around forever in netlink world. imo makes
>>> sense to use it with new syscalls. Passing 'struct xxx' into syscalls
>>> is the thing
>>> of the past. TLV style is more extensible. Fields of structures can become
>>> optional in the future, new fields added, etc.
>>> 'struct nlattr' brings the same benefits to kernel api as protobuf did
>>> to user land.
>>
>> I see no reason to bring nl_attr into this.
>>
>> Admittedly, I've never dealt with nl_attr, but everything
>> netlink-related I've even been involved in has involved some sort of
>> API atrocity.
>
> netlink has a lot of legacy and there is genetlink which is not pretty
> either because of extra socket creation, binding, dealing with packet
> loss issues, but the key concept of variable length encoding is sound.
> Right now seccomp has two commands and they already don't fit
> into single syscall neatly. Are you saying there should be two syscalls
> here? What about another seccomp related command? Another syscall?
> imo all seccomp related commands needs to be mux/demux-ed under
> one syscall. What is the way to mux/demux potentially very different
> commands under one syscall? I cannot think of anything better than
> TLV style. 'struct nlattr' is what we have today and I think it works fine.
> I'm not suggesting to bring the whole netlink into the picture, but rather
> TLV style of encoding different arguments for different commands.

I'm unconvinced.  These are simple commands, and I think the interface
should be simple.  Syscalls are cheap.

As an example, the interface could be:

int seccomp_add_filter(const struct sock_fprog *filter, unsigned int flags);

The "tsync" operation would be seccomp_add_filter(NULL,
SECCOMP_ADD_FILTER_TSYNC) -- it's equivalent to adding an
always-accept filter and syncing threads.

But, frankly, this kind of stuff should probably be "do operation X".
IIUC nl_attr is more like "do something, with these tags and values",
which results in oddities like whatever should happen of more than one
tag is set.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC



More information about the linux-arm-kernel mailing list