[RFC PATCH 00/13] nommu UML

Sun Oct 27 02:10:30 PDT 2024

Hello Benjamin,

thank you for your time looking at this.

On Sat, 26 Oct 2024 19:19:08 +0900,
Benjamin Berg wrote:

> > - a crash on userspace programs crashes a UML kernel, not signaling
> >   with SIGSEGV to the program.
> > - commit c27e618 (during v6.12-rc1 merge) introduces invalid access to
> >   a vma structure for our case, which updates the internal procedure
> >   of maple_tree subsystem.  We're trying to fix issue but still a
> >   random process on exit(2) crashes.
> 
> Btw. are you handling FP register save/restore? If it is not there, it
> probably would not be too hard to add (XSAVE, etc.), though it might
> add a bit of additional overhead. Especially as UML always saves the FP
> state rather than optimizing it like the x86 architectures.

The patch handles fp register on entry/leave at syscall; [07/13] patch
contains this part.

I'm not familiar with that but what kind of optimizations does x86
architecture do for fp register handling ?

> I am a bit confused overall. I mean, zpoline seems kind of neat, but a
> requirement on patching userspace code also seems like a lot.
> 
> To me, it seems much more natural to catch the userspace syscalls using
> a SECCOMP filter[1]. While quite a lot slower, that should be much more
> portable across architectures. For improved speed one could still do
> architecture specific things inside the vDSO or by using zpoline. But
> those would then "just" be optimizations and unpatched code would still
> work correctly (e.g. JIT).

I'm not proposing this patch to replace existing UML implementations;
for instance, the patchset cannot run CONFIG_MMU code in the whole
kernel tree so, existing ptrace-based implementation still has real
usecase.  and ptrace based syscall hook is not indeed fast and the
improvements with seccomp filter instead clearly has benefits.  I
think it's independent to this patchset.

So I think while your seccomp patches are also in review, this
patchset can exist in parallel.

btw, though I mentioned that JIT generated code is not currently
handled in a different reply, it can be implemented as an extension to
this patchset; the original implementation of zpoline now is able to
patch JIT generated code as well.

https://github.com/yasukata/zpoline/pull/20/commits/c42af16757ad3fcdf7084c9f2139bb9105796873

it is not implemented for the moment.

in terms of the portability, the basic idea of syscall hook with
zpoline is applicable to other platform, like aarch64
(https://github.com/retrage/svc-hook).  so I believe it has a chance
to expand this idea to other architectures than x86_64.

> For me, a big argument in favour of such an approach is its simplicity.
> I am mostly basing that on the fact that this patchset should properly
> handle other signals like SIGFPE and SIGSEGV. And, once it does that,
> you will already have all the infrastructure to do the correct register
> save/restore using the host mcontex, which is what is needed in the
> SIGSYS handler when using SECCOMP. The filter itself should be simple
> as it just needs to catch all syscalls within valid userspace
> executable memory[2] ranges.

I agree with your observation that the approach is simple.
I don't have a good idea on how to handle SIGSEGV, but will try to see
with your inputs.

> Benjamin
> 
> [1] Maybe not surprising, as I have been working on a SECCOMP based UML
> that does not require ptrace.

yes, I'm aware of it since before.  I have also conducted a benchmark
with several hook mechanisms, including seccomp with simple getpid
measurement.

https://speakerdeck.com/thehajime/netdev0x18-zpoline?slide=16

> [2] I am assuming that userspace executable code is already confined to
> a certain address space within the UML process. Obviously, the kernel
> itself and loaded modules need to be free to do host syscalls and
> should not be affected by the SECCOMP filter.

I think our !MMU UML doesn't break this assumption.  But did you see
something to our patchset ?

Thanks again,
-- Hajime