6.13+ uml crash

Benjamin Berg benjamin at sipsolutions.net
Tue Apr 7 03:21:01 PDT 2026


Hi,

On Tue, 2026-04-07 at 02:47 -0700, Maciej Żenczykowski wrote:
> On Tue, Apr 7, 2026 at 5:01 PM Benjamin Berg <benjamin at sipsolutions.net> wrote:
> > you get an ESRCH error, which means that either the UML userspace
> > process does not exist or maybe we do not have permission to trace for
> > some reason.
> > 
> > Now, the difference is that before the patch UML would just clone to
> > create the userspace processes. After the patch, it will execve() into
> > a separate executable that exists only as a memfd.
> > 
> > I am noticing now, that we are doing the PTRACE_TRACEME inside the new
> > executable instead of the usual method of doing it before execve. So
> > maybe that makes a difference with AppArmor.
> > 
> > So, I think you can do two things:
> >    1. As a quick workaround, simply use "seccomp=on".
> >       However, the sandboxing of the userspace processes is not quite
> >       as secure if you do that (i.e. they currently can break out and
> >       make host syscalls).
> 
> I tried the breaking commit, and 6.13.latest but it doesn't appear to
> support seccomp=on
> 
> I tried 6.18.latest but it fails to build for me with
> 
> In file included from ./include/linux/unwind_user.h:6,
>                  from ./include/linux/unwind_deferred.h:6,
>                  from kernel/fork.c:108:
> ./arch/x86/include/asm/unwind_user.h: In function ‘unwind_user_word_size’:
> ./arch/x86/include/asm/unwind_user.h:23:17: error: ‘struct pt_regs’
> has no member named ‘flags’
>    23 |         if (regs->flags & X86_VM_MASK)
>       |                 ^~
>   CC      net/ipv6/addrconf.o
> ./arch/x86/include/asm/unwind_user.h:23:27: error: ‘X86_VM_MASK’
> undeclared (first use in this function)
>    23 |         if (regs->flags & X86_VM_MASK)
>       |                           ^~~~~~~~~~~
> ./arch/x86/include/asm/unwind_user.h:23:27: note: each undeclared
> identifier is reported only once for each function it appears in
>   CC      block/blk-settings.o
>   CC      net/core/net_namespace.o
> ./arch/x86/include/asm/unwind_user.h:26:14: error: implicit
> declaration of function ‘user_64bit_mode’
> [-Wimplicit-function-declaration]
>    26 |         if (!user_64bit_mode(regs))
>       |              ^~~~~~~~~~~~~~~
> 
> it's likely related to some combination of kconfig options needed for
> android net tests...
> 
> 6.16.latest builds but still crashes (not sure if it supports
> seccomp=on), but the crash is different:

The below trace uses seccomp mode. In this case, it looks to me like
the stub process died on us during startup. I am kind of thinking that
in the ptrace case the stub executed but we couldn't trace it (i.e. it
likely managed to do the self kill with SIGSTOP in real_init, see
below).

I guess you'll need to debug it some more.

You could try to add some more logging, to e.g. print the status code
in os_reap_child. That information is missing in the log and may tell
you what failed when starting the stub process.

So, what you are looking at is:
 * start_userspace, clone() a new process
 * userspace_tramp, sets up the FDs
 * userspace_tramp, execve() the memfd
 * real_init in stub_exe.c continues and sets up the stub
   - sets up required memory maps and FDs
   - in seccomp mode, installs SECCOMP filters, etc.
   - in ptrace mode, does PTRACE_TRACEME and sends SIGSTOP to itself

Note that you cannot do much in stub_exe.c, which is why it uses
different exit codes to convey the error.

In seccomp mode you should be able to "strace -f" the UML process,
which should show you what is happening to the stub.

Not sure I can help more than this though.

Benjamin

> Run /sbin/net_test.sh as init process
> Unexpectedly lost MM child! Affected tasks will segfault.
> wait_stub_done_seccomp : failed to wait for stub, pid = -1, errno = 0
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> CPU: 0 UID: 0 PID: 1 Comm: net_test.sh Not tainted 6.16.12-g7e13344d2211 #4 NONE
> Stack:
>  80803b60 6063f23b 6051b353 00000000
>  00000001 6063f23b 606314ef 60027dfc
>  80803b90 6002f679 60c70000 609374c0
> Call Trace:
>  [<60027dfc>] ? _printk+0x0/0x57
>  [<6003237a>] show_stack+0x10b/0x11a
>  [<6051b353>] ? dump_stack_print_info+0x0/0x12b
>  [<60027dfc>] ? _printk+0x0/0x57
>  [<6002f679>] dump_stack_lvl+0x5e/0x79
>  [<6002f6ae>] dump_stack+0x1a/0x1c
>  [<60026626>] panic+0x156/0x377
>  [<6003ff78>] ? unblock_signals+0x25/0x28
>  [<6003ffa1>] ? um_set_signals+0x26/0x3e
>  [<600264d0>] ? panic+0x0/0x377
>  [<60049fd8>] do_exit+0x205/0x9f6
>  [<60162f26>] ? kmem_cache_free+0x118/0x12b
>  [<6004a9ff>] sys_exit_group+0x0/0x16
>  [<60055d9c>] get_signal+0x7dc/0x803
>  [<600555c0>] ? get_signal+0x0/0x803
>  [<6004349c>] ? setup_signal_stack_si+0x0/0x1f0
>  [<60055166>] ? signal_setup_done+0x0/0xb1
>  [<600320f5>] do_signal+0x54/0x1ce
>  [<60539370>] ? _raw_spin_unlock_irqrestore+0x0/0x39
>  [<60032db6>] fatal_sigsegv+0x32/0x3e
>  [<60041ac6>] wait_stub_done_seccomp+0x2b1/0x2c0
>  [<60041efa>] userspace+0xff/0x694
>  [<60031769>] ? interrupt_end+0x0/0xac
>  [<60179ba4>] ? copy_strings_kernel+0x0/0x9e
>  [<6003166a>] new_thread_handler+0x5a/0x5e
> /aosp-tests/net/test/run_net_test.sh: line 503: 939215 Aborted
>            (core dumped) $KERNEL_BINARY umid=net_test mem=512M
> seccomp=on $blockdevice=$ROOTFS $netconfig $consolemode ssl3=null,fd:3
> $cmdline 1>&2 3> "${SSL3}"
> Warning: UML exited with 134 instead of zero.
> 
> >    2. Move the ptrace(PTRACE_TRACEME) call into userspace_tramp to
> >       check if AppArmor is permitting ptrace then.
> 
> haven't yet had time to try this approach.


Benjamin

> 
> > 
> > Benjamin
> > 
> > On Tue, 2026-04-07 at 15:15 +0900, Maciej Żenczykowski wrote:
> > > On Tue, Apr 7, 2026 at 2:56 PM Berg, Johannes <johannes.berg at intel.com> wrote:
> > > > Hi,
> > > > 
> > > > Haven't looked at this yet - but really better if you CC the UML list.
> > > > 
> > > > johannes
> > > > 
> > > > > -----Original Message-----
> > > > > From: Maciej Żenczykowski <maze at google.com>
> > > > > Sent: Saturday, April 4, 2026 5:25 AM
> > > > > To: Berg, Benjamin <benjamin.berg at intel.com>; Berg, Johannes
> > > > > <johannes.berg at intel.com>; Tiwei Bie <tiwei.btw at antgroup.com>
> > > > > Subject: Re: 6.13+ uml crash
> > > > > 
> > > > > On Fri, Apr 3, 2026 at 12:34 PM Maciej Żenczykowski <maze at google.com>
> > > > > wrote:
> > > > > > 
> > > > > > Host is:
> > > > > > 
> > > > > > 6.18.14-1rodete1-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.18.14-
> > > > > 1rodete1
> > > > > > (2026-03-06) x86_64 GNU/Linux
> > > > > > 
> > > > > > /proc/cmdline
> > > > > > ima_hash=sha256 kfence.sample_interval=100 intel_iommu=sm_off
> > > > > > pci=noats printk.devkmsg=on slab_nomerge
> > > > > > lsm=landlock,lockdown,yama,loadpin,safesetid,integrity,apparmor,selinu
> > > > > > x,smack,tomoyo,bpf
> > > > > > apparmor=1 panic=30 glinux-boot-image=default-20260324.03.00
> > > > > > earlycon=uart8250,io,0x3f8 console=ttyS0,115200n8 console=hvc0
> > > > > > console=tty0 splash plymouth.ignore-serial-consoles i915.enable_psr=0
> > > > > > 
> > > > > > No SELinux, but I think there is likely some corporate AppArmor policy...
> > > > > > 
> > > > > > I seem to hit this problem while trying to use UML on latest
> > > > > > 6.13/6.14/6.15/6.16/6.17/6.18/6.19
> > > > > > 6.12 guests are the last ones that work.
> > > > > > 
> > > > > > I noticed 6.13 had many UML changes...
> > > > > > 
> > > > > > This is while trying to run the android net tests.
> > > > > > 
> > > > > > https://android.googlesource.com/kernel/tests/+log/refs/heads/mirror-g
> > > > > > oog-main-kernel /aosp-tests/net/test/run_net_test.sh --builder
> > > > > > all_tests.sh
> > > > > > 
> > > > > > ...
> > > > > > VFS: Mounted root (ext3 filesystem) readonly on device 98:0.
> > > > > > devtmpfs: mounted
> > > > > > This architecture does not have kernel memory protection.
> > > > > > Run /sbin/net_test.sh as init process
> > > > > > Registers -
> > > > > >         0       0x60025a90
> > > > > >         1       0x2e2ea5
> > > > > >         2       0x600423e4
> > > > > >         3       0x2e2ea8
> > > > > >         4       0x7fffffffbe30
> > > > > >         5       0x0
> > > > > >         6       0x206
> > > > > >         7       0x0
> > > > > >         8       0x7fffffffbe30
> > > > > >         9       0xffffffff
> > > > > >         10      0x0
> > > > > >         11      0x7ffff7c40c87
> > > > > >         12      0x0
> > > > > >         13      0x13
> > > > > >         14      0x2e2ea8
> > > > > >         15      0x3e
> > > > > >         16      0x7fffffffc000
> > > > > >         17      0x33
> > > > > >         18      0x206
> > > > > >         19      0x7fffffffeff8
> > > > > >         20      0x2b
> > > > > >         21      0x7ffff7f9d740
> > > > > >         22      0x0
> > > > > >         23      0x0
> > > > > >         24      0x0
> > > > > >         25      0x0
> > > > > >         26      0x0
> > > > > > Kernel panic - not syncing: do_syscall_stub : PTRACE_SETREGS failed,
> > > > > > errno = 3  [ESRCH]
> > > > > > CPU: 0 UID: 0 PID: 1 Comm: net_test.sh Not tainted
> > > > > > 6.13.12-g2abfa5d47651 #3
> > > > > > Stack:
> > > > > >  600277e4 60629dd1 80803d80 00000000
> > > > > >  00000001 60629dd1 6061c0a9 600277e4
> > > > > >  80803db0 6002efd2 60c70000 609171c0
> > > > > > Call Trace:
> > > > > >  [<600277e4>] ? _printk+0x0/0x57
> > > > > >  [<600322ec>] show_stack+0x10b/0x11a
> > > > > >  [<600277e4>] ? _printk+0x0/0x57
> > > > > >  [<600277e4>] ? _printk+0x0/0x57
> > > > > >  [<6002efd2>] dump_stack_lvl+0x5e/0x79  [<6002f007>]
> > > > > > dump_stack+0x1a/0x1c  [<60026838>] panic+0x156/0x377  [<600277e4>] ?
> > > > > > _printk+0x0/0x57  [<600266e2>] ? panic+0x0/0x377  [<600277e4>] ?
> > > > > > _printk+0x0/0x57  [<6004555b>] do_syscall_stub+0xa6/0x132
> > > > > > [<60031608>] ? interrupt_end+0x0/0xae  [<6004560b>]
> > > > > > syscall_stub_flush+0x24/0x2e  [<60045daf>] userspace+0x81/0x469
> > > > > > [<60031608>] ? interrupt_end+0x0/0xae  [<601761ba>] ?
> > > > > > copy_strings_kernel+0x0/0x9e  [<60031509>]
> > > > > > new_thread_handler+0x5a/0x5e
> > > > > > /aosp-tests/net/test/run_net_test.sh: line 498: 3026597 Aborted
> > > > > >             (core dumped) $KERNEL_BINARY umid=net_test mem=512M
> > > > > > $blockdevice=$ROOTFS $netconfig $consolemode ssl3=null,fd:3 $cmdline
> > > > > > 1>&2 3> "${SSL3}"
> > > > > > Warning: UML exited with 134 instead of zero.
> > > > > 
> > > > > It appears this broke at:
> > > > > 
> > > > > commit 32e8eaf263d9be014ba1970444f745682fa9c6c0
> > > > > Author: Benjamin Berg <benjamin.berg at intel.com>
> > > > >     um: use execveat to create userspace MMs
> > > > Intel Deutschland GmbH
> > > > Registered Address: Dornacher Strasse 1, 85622 Feldkirchen, Germany
> > > > Tel: +49 89 991 430, www.intel.de
> > > > Managing Directors: Harry Demas, Jeffrey Schneiderman, Yin Chong Sorrell
> > > > Chairperson of the Supervisory Board: Nicole Lau
> > > > Registered Seat: Munich
> > > > Commercial Register: Amtsgericht Muenchen HRB 186928
> > > 
> > > --
> > > Maciej Żenczykowski, Kernel Networking Developer @ Google
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google



More information about the linux-um mailing list