[PATCH 0/7] um: skas: harden the seccomp userspace stub

Mon Jun 22 05:08:21 PDT 2026

On Fri, 2026-06-19 at 20:22 -0700, Cong Wang wrote:
> From: Cong Wang <cwang at multikernel.io>
> 
> In the seccomp ("SECCOMP") userspace mode, each guest userspace process
> runs in a stub under a seccomp filter and traps to the monitor (the UML
> kernel) on every syscall. Two items on the stub.c "Known security issues"
> list could not be addressed by the filter alone:
> 
>   - a hijacked stub could mmap() arbitrary physmem offsets, which is an
>     intra-guest disclosure and, on this base (single physmem fd, no
>     kernel/user split), a host escape; and
> 
>   - a hijacked stub could block SIGALRM via a crafted rt_sigreturn to
>     evade preemption and wedge the monitor indefinitely.
> 
> This series closes both:
> 
>   1-2: route the stub's mmap() through a SECCOMP_RET_USER_NOTIF listener
>        owned by the monitor (no behavioural change yet).
>   3-4: validate each mmap() against the mm's page table -- allowed iff the
>        PTE already maps the requested frame with no more access than it
>        grants -- including out-of-batch mmaps a hijacked stub issues on
>        its own.
>   5:   route and validate munmap() the same way (range-confined below
>        STUB_START).

That approach seems odd to me. Adding an explicit out-of-band check
means you require two extra context switches per mmap syscall. I would
expect that this makes the SECCOMP approach a lot slower than ptrace().
My take is still that it is possible to carefully craft a SECCOMP
filter as well as stub/kernel code that makes exploitation impossible
for non-SMP.

The true SMP case is more complicated, but we do not have that anyway,
so I would not worry about it for now.

Did you run any performance tests?

>   6:   add a watchdog thread that detects a stub which stops reporting
>        back (e.g. blocked SIGALRM) and SIGKILLs it, letting the monitor
>        recover via the existing teardown.

That also seems like an odd solution to me. Architecturally, UML first
receives the SIGALRM and forwards it to the child. It would seem much
easier to set a flag and clear it again when the process reports back
that it received the SIGALRM. Then, when the kernel receives the next
SIGALRM, just kill the child immediately if the flag is still set.

>   7:   drop the now-resolved "Known security issues" note and refresh the
>        seccomp= help text.

Benjamin

> After the series a hijacked stub is confined to the frames its own page
> tables reference and can no longer reach arbitrary guest/host memory; one
> that evades preemption is detected out of band and killed rather than
> wedging the monitor.
> 
> Verified on UML (UP and 2-CPU SMP): boots and survives fork/exec storms
> and heavy mmap/munmap churn with zero false denials or false kills; an
> artificially SIGALRM-blocked busy loop is killed in ~5s and the monitor
> recovers, while syscall-making processes are untouched. Each patch builds
> and the series is bisectable.
> 
> ---
> Cong Wang (7):
>   um: skas: create a seccomp USER_NOTIF listener and hand it to the
>     monitor
>   um: skas: gate stub mmap() through the USER_NOTIF monitor
>   um: skas: validate stub mmap() against the guest page table
>   um: skas: handle out-of-batch stub mmap notifications
>   um: skas: validate stub munmap() against the guest address range
>   um: skas: kill stubs that block SIGALRM via a watchdog thread
>   um: skas: refresh stub security notes after closing the known issues
> 
>  arch/um/include/shared/skas/mm_id.h |   1 +
>  arch/um/include/shared/skas/skas.h  |   5 +
>  arch/um/kernel/skas/stub.c          |  22 --
>  arch/um/kernel/skas/stub_exe.c      |  19 +-
>  arch/um/kernel/skas/uaccess.c       |  48 +++++
>  arch/um/os-Linux/skas/process.c     | 315 ++++++++++++++++++++++++----
>  arch/um/os-Linux/start_up.c         |   6 -
>  7 files changed, 344 insertions(+), 72 deletions(-)
> 
> 
> base-commit: 1a3746ccbb0a97bed3c06ccde6b880013b1dddc1