[PATCH v10 0/11] seccomp: add thread sync ability

Wed Jul 16 14:23:25 PDT 2014

On Wed, Jul 16, 2014 at 12:45 PM, Andy Lutomirski <luto at amacapital.net> wrote:
> On Wed, Jul 16, 2014 at 10:54 AM, Kees Cook <keescook at chromium.org> wrote:
>> On Mon, Jul 14, 2014 at 12:04 PM, Andy Lutomirski <luto at amacapital.net> wrote:
>>> On Fri, Jul 11, 2014 at 10:55 AM, Kees Cook <keescook at chromium.org> wrote:
>>>> On Fri, Jul 11, 2014 at 9:49 AM, Oleg Nesterov <oleg at redhat.com> wrote:
>>>>> On 07/10, Kees Cook wrote:
>>>>>>
>>>>>> This adds the ability for threads to request seccomp filter
>>>>>> synchronization across their thread group (at filter attach time).
>>>>>> For example, for Chrome to make sure graphic driver threads are fully
>>>>>> confined after seccomp filters have been attached.
>>>>>>
>>>>>> To support this, locking on seccomp changes via thread-group-shared
>>>>>> sighand lock is introduced, along with refactoring of no_new_privs. Races
>>>>>> with thread creation are handled via delayed duplication of the seccomp
>>>>>> task struct field and cred_guard_mutex.
>>>>>>
>>>>>> This includes a new syscall (instead of adding a new prctl option),
>>>>>> as suggested by Andy Lutomirski and Michael Kerrisk.
>>>>>
>>>>> I do not not see any problems in this version,
>>>>
>>>> Awesome! Thank you for all the reviews. :) If Andy and Michael are
>>>> happy with this too, I think this is in good shape. \o/
>>>
>>> I think I'm happy with it.  Is it in git somewhere for easy perusal?
>>> I have a cold, so my reviewing ability is a bit off, but I want to
>>> take a look at the final version, and git is a little easier than
>>> email for this.
>>
>> Hi Andy,
>>
>> Have you had a chance to look v10 over? I'd like to send a v11 with
>> Oleg's Reviewed-by added (at James Morris's request). Should I add one
>> from you as well?
>
> Trivial nits (take them or leave them):
>
> seccomp_check_mode confused me.  Maybe seccomp_may_assign_mode would
> be a better name.

Good idea; I was unhappy with this name too. Done for v11.

> In the writer locking changelog, should "(which is unique to the
> thread group)" be "(which is shared by all threads in the thread
> group)"?

Done.

> This bit:
>
>     /*
>      * Explicitly enable no_new_privs here in case it got set
>      * between the task_struct being duplicated and holding the
>      * sighand lock. The seccomp state and nnp must be in sync.
>      */
>     if (task_no_new_privs(current))
>         task_set_no_new_privs(p);
>
> should arguably move the very end of task duplication -- someone will
> want it for some other purpose some day.

That certainly seems plausible, but for the moment, I want to keep nnp
near seccomp, since that's where we have a race.

> This bit:
>
>     /* If we have a seccomp mode, enable the thread flag. */
>     if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
>         set_tsk_thread_flag(p, TIF_SECCOMP);
>
> is not quite obvious to me -- it's not obvious why it's needed.  If
> it's to eliminate a race, can you explain the race in the comment?  If
> it's just paranoia, a WARN_ON or BUG_ON might be worthwhile.

I've expanded the comment; it's the same issue as nnp: seccomp could
have changed, so if we gained a mode, we need to set the flag too.

> SECCOMP_FILTER_FLAG_MASK seems backwards to me.  Maybe rename it to
> SECCOMP_FILTER_ALLOWED_FLAGS and invert it?

Excellent point. Other kernel users of the _MASK name aren't inverted. Fixed.

> is_ancestor: calling the first parameter "candidate" is just a tiny
> bit odd -- it's the child that's up for consideration.  (This is
> barely even worth the time it took me to type it.)

Switch argument and comment to "parent".

> Less trivial nits:
>
> Can you change:
>
> fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN);
>
> to
>
> fp = kcalloc(fprog->len, sizeof(struct sock_filter), GFP_KERNEL|__GFP_NOWARN);
>
> somewhere in this series (w/ a changelog comment that it's not
> exploitable to avoid scaring people).

Done, though I used a BUG_ON since the compiler will optimize it out
(since it's an impossible state to be in).

> In seccomp_prepare_user_filter, would it make sense to return -EINVAL
> if !user_filter?  That will make it slightly more pleasant to
> implement TSYNC-without-change if anyone ever wants it.  (This isn't
> really necessary -- it's just slightly more polite.)

I can't do this since EFAULT is already used to detect seccomp
capabilities from userspace.

> Once James picks this up, I'll rebase my series on top of it.  I think
> they'll conflict slightly.

It's not bad. I tripped over sd -> sd_local and the related sd vs &sd
in the SK_RUN_FILTER call, but otherwise it was pretty trivial.

> Feel free to add my Reviewed-by to the whole series except the ARM and
> MIPS patches.

Thanks!

> And some thoughts:
>
> Your changelog comment for the seccomp syscall suggests that you're
> thinking about extending the tracer interface.  I'd suggest a rather
> different design: add a concept of a seccomp monitor associated with a
> particular filter and an action SECCOMP_RET_MONITOR.
> SECCOMP_RET_MONITOR will tell the monitor what syscall was attempted
> and will wait for further instructions.  The monitor can then ask for
> zero or more syscalls to be issued on behalf of the waiting process
> and then resume it.  Each of those additional syscalls will be further
> filtered through all seccomp filters outside of the one that returned
> SECCOMP_RET_MONITOR.
>
> This would avoid all of the nastiness inherent in using ptrace and
> would nest much more nicely.  And it could be far faster.
>
> If you decide to do something to ARM along the lines of what I'm doing
> to x86, you may want to fix this:
>
>     /*
>      * Make sure that any changes to mode from another thread have
>      * been seen after TIF_SECCOMP was seen.
>      */
>     rmb();
>
> It should have essentially no effect on x86, though.

Noted, thanks!

-Kees

-- 
Kees Cook
Chrome OS Security