[PATCH] kernel: introduce prctl(PR_LOG_UACCESS)

Jann Horn jannh at google.com
Wed Sep 22 08:59:19 PDT 2021


On Wed, Sep 22, 2021 at 5:30 PM Kees Cook <keescook at chromium.org> wrote:
> On Wed, Sep 22, 2021 at 09:23:10AM -0500, Eric W. Biederman wrote:
> > Peter Collingbourne <pcc at google.com> writes:
> >
> > > This patch introduces a kernel feature known as uaccess logging.
> > > With uaccess logging, the userspace program passes the address and size
> > > of a so-called uaccess buffer to the kernel via a prctl(). The prctl()
> > > is a request for the kernel to log any uaccesses made during the next
> > > syscall to the uaccess buffer. When the next syscall returns, the address
> > > one past the end of the logged uaccess buffer entries is written to the
> > > location specified by the third argument to the prctl(). In this way,
> > > the userspace program may enumerate the uaccesses logged to the access
> > > buffer to determine which accesses occurred.
> > > [...]
> > > 3) Kernel fuzzing. We may use the list of reported kernel accesses to
> > >    guide a kernel fuzzing tool such as syzkaller (so that it knows which
> > >    parts of user memory to fuzz), as an alternative to providing the tool
> > >    with a list of syscalls and their uaccesses (which again thanks to
> > >    (2) may not be accurate).
> >
> > How is logging the kernel's activity like this not a significant
> > information leak?  How is this safe for unprivileged users?
>
> This does result in userspace being able to "watch" the kernel progress
> through a syscall. I'd say it's less dangerous than userfaultfd, but
> still worrisome. (And userfaultfd is normally disabled[1] for unprivileged
> users trying to interpose the kernel accessing user memory.)
>
> Regardless, this is a pretty useful tool for this kind of fuzzing.
> Perhaps the timing exposure could be mitigated by having the kernel
> collect the record in a separate kernel-allocated buffer and flush the
> results to userspace at syscall exit? (This would solve the
> copy_to_user() recursion issue too.)

Other than what Kees has already said, the only security concern I
have with that patch should be trivial to fix: If the ->uaccess_buffer
machinery writes to current's memory, it must be reset during
execve(), before switching to the new mm, to prevent the old task from
causing the kernel to scribble into the new mm.

One aspect that might benefit from some clarification on intended
behavior is: what should happen if there are BPF tracing programs
running (possibly as part of some kind of system-wide profiling or
such) that poke around in userspace memory with BPF's uaccess helpers
(especially "bpf_copy_from_user")?

> I'm pondering what else might be getting exposed by creating this level
> of probing... kernel addresses would already be getting rejected, so
> they wouldn't show up in the buffer. Hmm. Jann, any thoughts here?
>
>
> Some other thoughts:
>
>
> Instead of reimplementing copy_*_user() with a new wrapper that
> bypasses some checks and adds others and has to stay in sync, etc,
> how about just adding a "recursion" flag? Something like:
>
>     copy_from_user(...)
>         instrument_copy_from_user(...)
>             uaccess_buffer_log_read(...)
>                 if (current->uaccess_buffer.writing)
>                     return;
>                 uaccess_buffer_log(...)
>                     current->uaccess_buffer.writing = true;
>                     copy_to_user(...)
>                     current->uaccess_buffer.writing = false;
>
>
> How about using this via seccomp instead of a per-syscall prctl? This
> would mean you would have very specific control over which syscalls
> should get the uaccess tracing, and wouldn't need to deal with
> the signal mask (I think). I would imagine something similar to
> SECCOMP_FILTER_FLAG_LOG, maybe SECCOMP_FILTER_FLAG_UACCESS_TRACE, and
> add a new top-level seccomp command, (like SECCOMP_GET_NOTIF_SIZES)
> maybe named SECCOMP_SET_UACCESS_TRACE_BUFFER.
>
> This would likely only make sense for SECCOMP_RET_TRACE or _TRAP if the
> program wants to collect the results after every syscall. And maybe this
> won't make any sense across exec (losing the mm that was used during
> SECCOMP_SET_UACCESS_TRACE_BUFFER). Hmmm.

And then I guess your plan would be that userspace would be expected
to use the userspace instruction pointer
(seccomp_data::instruction_pointer) to indicate instructions that
should be traced?

Or instead of seccomp, you could do it kinda like
https://www.kernel.org/doc/html/latest/admin-guide/syscall-user-dispatch.html
, with a prctl that specifies a specific instruction pointer?



More information about the linux-arm-kernel mailing list