[RFC PATCH 05/13] x86/um: nommu: syscall translation by zpoline

Hajime Tazaki thehajime at gmail.com
Sat Oct 26 00:36:03 PDT 2024


On Sat, 26 Oct 2024 00:20:49 +0900,
Johannes Berg wrote:
> 
> On Fri, 2024-10-25 at 21:58 +0900, Hajime Tazaki wrote:
> > 
> > > > +	if (down_write_killable(&mm->mmap_lock)) {
> > > > +		err = -EINTR;
> > > > +		return err;
> > > 
> > > ?
> > 
> > the lock isn't needed actually so, will remove it.
> 
> Oh, I was just looking at the weird handling of the err variable :)

Ah, now I see. I'd revert the lock part with `return -EINTR` instead.

> > > What happens if the binary JITs some code and you don't find it? I don't
> > > remember from your talk - there you seemed to say this was fine just
> > > slow, but that was zpoline in a different context (container)?
> > 
> > instructions loaded after execve family (like JIT generated code,
> > loaded with dlopen, etc) isn't going to be translated.  we can
> > translated it by tweaking the userspace loader (ld.so w/ LD_PRELOAD)
> > or hook mprotect(2) syscall before executing JIT generated code.
> > generic description is written in the document ([12/13]).
> 
> Guess I should've read that, sorry.

no no, since this part is completely new feature and I'd like to
explain any unclear points to help understanding, so any inputs are
always nice.

# btw, the talk at last netdev was not container specific context, but
  more focus on the syscall hook mechanism itself so, I didn't go much
  detail at that time.

> > > Perhaps UML could additionally install a seccomp filter or something on
> > > itself while running a userspace program? Hmm.
> > 
> > I'm trying to understand the purpose of seccomp filter you suggested
> > here; is it for preventing executed by untranslated code ?
> 
> Yeah, that's what I was wondering.
> 
> Obviously you have to be able to get rid of the seccomp filter again so
> it's not foolproof, but perhaps not _that_ bad?
> 
> I'm not worried about security or so, it's clear this isn't even _meant_
> to have security. But I do wonder about really hard to debug issues if
> userspace suddenly makes syscalls to the host, that'd be ... difficult
> to understand?

I totally understand; I faced similar situations during the developing
this patchset.

Originally our patchset had a whitelist-based seccomp filter (w/
SCMP_ACT_ALLOW), but dropped from this RFC as I found that 1) this is
not the !MMU specific feature (it can be generally applied to all UML
use cases), and 2) we cannot prevent a syscall (e.g., ioctl(2)) from
userspace which is white-listed in our seccomp filter, thus the newly
introduced filter may not be perfect.

the maintenance of the whitelist is also not easy; the syscall used in
one version is renamed at some point in future (what I faced is
SCMP_SYS(open) should be renamed with SCMP_SYS(openat)).

-- Hajime



More information about the linux-um mailing list