[RFC PATCH 00/13] nommu UML

Hajime Tazaki thehajime at gmail.com
Wed Oct 30 02:25:18 PDT 2024


Hello,

On Mon, 28 Oct 2024 22:32:43 +0900,
Benjamin Berg wrote:

> > > > - a crash on userspace programs crashes a UML kernel, not signaling
> > > >   with SIGSEGV to the program.
> > > > - commit c27e618 (during v6.12-rc1 merge) introduces invalid access to
> > > >   a vma structure for our case, which updates the internal procedure
> > > >   of maple_tree subsystem.  We're trying to fix issue but still a
> > > >   random process on exit(2) crashes.
> > > 
> > > Btw. are you handling FP register save/restore? If it is not there, it
> > > probably would not be too hard to add (XSAVE, etc.), though it might
> > > add a bit of additional overhead. Especially as UML always saves the FP
> > > state rather than optimizing it like the x86 architectures.
> > 
> > The patch handles fp register on entry/leave at syscall; [07/13] patch
> > contains this part.
> 
> That looks like FS/GS registers which are for thread-local storage. I
> was talking about floating point registers. Maybe you meant another
> patch?

oh, this is my terrible mistake...
no, the patch doesn't handle fp resister at all.

> > I'm not familiar with that but what kind of optimizations does x86
> > architecture do for fp register handling ?
> 
> The kernel does not usually need the FP registers. So it optimizes the
> pretty common case of a userspace -> kernel -> userspace switch that
> happens for a syscall by simply not saving/restoring these registers at
> all.
> 
> Obviously, it then still needs to do the work when the task is switched
> or in the rare case that the kernel wants to use floating point itself.

thanks for the information.

> > > I am a bit confused overall. I mean, zpoline seems kind of neat, but a
> > > requirement on patching userspace code also seems like a lot.
> > > 
> > > To me, it seems much more natural to catch the userspace syscalls using
> > > a SECCOMP filter[1]. While quite a lot slower, that should be much more
> > > portable across architectures. For improved speed one could still do
> > > architecture specific things inside the vDSO or by using zpoline. But
> > > those would then "just" be optimizations and unpatched code would still
> > > work correctly (e.g. JIT).
> > 
> > I'm not proposing this patch to replace existing UML implementations;
> > for instance, the patchset cannot run CONFIG_MMU code in the whole
> > kernel tree so, existing ptrace-based implementation still has real
> > usecase.  and ptrace based syscall hook is not indeed fast and the
> > improvements with seccomp filter instead clearly has benefits.  I
> > think it's independent to this patchset.
> 
> Of course. nommu mode is a completely independent feature.
> 
> I am still wondering a bit about the users for such a mode. It is not
> interesting for us as we use it for testing. Of course, speed is nice
> but it is not the primary objective.
> 
> I understand that it can be an approach for a small "container", but
> then you would need a very strict SECCOMP filter for the kernel itself.

I didn't specifically describe the usecase for this at the v1 patch;
but at least here is the list in my mind.

1) container-like usecase can be one of them (the original work proposed
toward this),
2) testing nommu code in kernel might be another use,
3) faster I/O workload which involves bunch of syscalls over UML can
be also interesting.

I think this list covers pretty much to have !MMU mode in current
MMU-full UML.

speed might not be indeed the primary objective but if you'll see the
dozen of test cases which issues bunch of syscalls (which I think
possible case), this might be helpful.

(snip)

> > > For me, a big argument in favour of such an approach is its simplicity.
> > > I am mostly basing that on the fact that this patchset should properly
> > > handle other signals like SIGFPE and SIGSEGV. And, once it does that,
> > > you will already have all the infrastructure to do the correct register
> > > save/restore using the host mcontex, which is what is needed in the
> > > SIGSYS handler when using SECCOMP. The filter itself should be simple
> > > as it just needs to catch all syscalls within valid userspace
> > > executable memory[2] ranges.
> > 
> > I agree with your observation that the approach is simple.
> > I don't have a good idea on how to handle SIGSEGV, but will try to see
> > with your inputs.
> 
> You can probably use "[RFC PATCH v2 5/9] um: Add helper functions to
> get/set state for SECCOMP" for getting the registers and also writing
> them back if you want to restore using rt_sigreturn.

thanks,

I'm still testing with various attempts to deliver SEGV to userspace,
but yet no luck so far...  I will get you back once I come up with a
nice form.

(snip)
> > > [2] I am assuming that userspace executable code is already confined to
> > > a certain address space within the UML process. Obviously, the kernel
> > > itself and loaded modules need to be free to do host syscalls and
> > > should not be affected by the SECCOMP filter.
> > 
> > I think our !MMU UML doesn't break this assumption.  But did you see
> > something to our patchset ?
> 
> I also assume that is fine. One just needs to understand this when
> writing a SECCOMP filter for syscall emulation in nommu mode.

okay, thanks for the clarification.

-- Hajime



More information about the linux-um mailing list