[RFC PATCH 05/13] x86/um: nommu: syscall translation by zpoline

Hajime Tazaki thehajime at gmail.com
Fri Oct 25 05:58:25 PDT 2024



On Fri, 25 Oct 2024 18:19:25 +0900,
Johannes Berg wrote:
> 
> On Thu, 2024-10-24 at 21:09 +0900, Hajime Tazaki wrote:
> > This commit adds a mechanism to hook syscalls for unmodified userspace
> > programs used under UML in !MMU mode. The mechanism, called zpoline,
> > translates syscall/sysenter instructions with `call *%rax`, which can be
> > processed by a trampoline code also installed upon an initcall during
> > boot. The translation is triggered by elf_arch_finalize_exec(), an arch
> > hook introduced by another commit.
> > 
> > All syscalls issued by userspace thus redirected to a speicific function,
> 
> typo: "specific"

thanks.

> > +	if (down_write_killable(&mm->mmap_lock)) {
> > +		err = -EINTR;
> > +		return err;
> 
> ?

the lock isn't needed actually so, will remove it.

> What happens if the binary JITs some code and you don't find it? I don't
> remember from your talk - there you seemed to say this was fine just
> slow, but that was zpoline in a different context (container)?

instructions loaded after execve family (like JIT generated code,
loaded with dlopen, etc) isn't going to be translated.  we can
translated it by tweaking the userspace loader (ld.so w/ LD_PRELOAD)
or hook mprotect(2) syscall before executing JIT generated code.
generic description is written in the document ([12/13]).

> Perhaps UML could additionally install a seccomp filter or something on
> itself while running a userspace program? Hmm.

I'm trying to understand the purpose of seccomp filter you suggested
here; is it for preventing executed by untranslated code ?

> > +/**
> > + * setup trampoline code for syscall hooks
> > + *
> > + * the trampoline code guides to call hooked function, __kernel_vsyscall
> > + * in this case, via nop slides at the memory address zero (thus, zpoline).
> > + *
> > + * loaded binary by exec(2) is translated to call the function.
> > + */
> > +static int __init setup_zpoline_trampoline(void)
> > +{
> > +	int i, ret;
> > +	int ptr;
> > +
> > +	/* zpoline: map area of trampoline code started from addr 0x0 */
> > +	__zpoline_start = 0x0;
> > +
> > +	ret = os_map_memory((void *) 0, -1, 0, 0x1000, 1, 1, 1);
> 
> (UM_)PAGE_SIZE?

thanks, it's much better; will fix it.

> > +	/**
> > +	 * FIXME: shit red zone area to properly handle the case
> 
> "shift"? :)

thanks (ùð¡\ùð)

> > +	 */
> > +
> > +	/**
> > +	 * put code for jumping to __kernel_vsyscall.
> > +	 *
> > +	 * here we embed the following code.
> > +	 *
> > +	 * movabs [$addr],%r11
> > +	 * jmpq   *%r11
> > +	 *
> > +	 */
> > +	ptr = NR_syscalls;
> > +	/* 49 bb [64-bit addr (8-byte)]    movabs [64-bit addr (8-byte)],%r11 */
> > +	__zpoline_start[ptr++] = 0x49;
> > +	__zpoline_start[ptr++] = 0xbb;
> > +	__zpoline_start[ptr++] = ((uint64_t)
> > +				  __kernel_vsyscall >> (8 * 0)) & 0xff;
> 
> &0xff seems pointless with a u8 array?

agree, will fix it.

> > +	/* permission: XOM (PROT_EXEC only) */
> > +	ret = os_protect_memory(0, 0x1000, 0, 0, 1);
> 
> (UM_)PAGE_SIZE?

will fix it too.

-- Hajime


More information about the linux-um mailing list