[PATCH] uml/helper: Fix stack alignment

Sun Apr 18 05:56:03 BST 2021

On Sat, Apr 17, 2021 at 2:40 PM Johannes Berg <johannes at sipsolutions.net> wrote:
> How about the same pattern in start_userspace() and new_thread()?
> Doesn't matter for some reason? Or just didn't hit it yet because
> there's no 16-byte "thing" on the stack?

I see. I was not aware they were there. Didn't hit :)

* start_userspace -- yes, the exact same pattern via clone.
* new_thread -- no, this is a longjmp buffer that sets the instruction
pointer and stack pointer directly. It emulates a call instruction, as
if the return address has already been pushed. Let me give an example.

The clone trace works like:

  ; 16-aligned
  callq  *%rax ; push 8
  push   %rbp ; push 8
  ; 16 aligned
  mov    %rsp,%rbp

However, for a longjmp buffer, it will jump directly to `push %rbp`,
so a 8-offset is needed to account for the missing offset from `callq
*%rax`.

In my case, if I remove this offset from new_thread, init_thread_regs,
and start_idle_thread, it will segfault loop in ZSTD_getParams while
initializing btrfs.

Other places I found similar patterns:
* init_thread_regs & start_idle_thread -- no,  longjmp buffers, same
as new_thread.
* ubd_driver_init -- yes, same as run_helper.
* set_sigstack -- This one is interesting:
  * um on native x86_64: a quick test reveals that whether the offset
is used does not affect the stack pointer. In the code, upon signal
delivery (in the host kernel) get_sigframe calls align_sigframe, so
unaligned stack pointers make no difference.
  * um on um x86_64: I'm having trouble testing um within um, getting
a weird error ("start_userspace : expected SIGSTOP, got status = 2943"
when starting init, might try to debug later), but the code in
handle_signal also aligns the stack.
  * The man page of sigaltstack and a quick google search shows that
the conventional usage of sigaltstack does not reduce the size of
stack by sizeof(void *), so I think it's probably best to follow the
convention here.
* stub_clone_handler -- This looks fragile:
  * As far as I understand, it splits the STUB_DATA page into half,
the higher-address half for the parent to run the stub (from
init_thread_regs) and the lower half are switched to in the child.
This is fragile. The clone syscall to the host is done in the middle
of a function body and we are depending on the compiler optimizations
that it won't store any shared variables on stack that would be
missing once the stack pointer is set to something else. Similarly,
the base pointer is still shared between the two processes. It's also
dependent on optimizations to say shared variables won't overwrite
each other.
  * I'm thinking, considering that we don't have CLONE_VM set in the
stub, is it possible to force the child to map a private page for use
as a stack, so modifications to the stack done inside the child are
not seen by the parent?
  * That said, it's still potentially trouble-causing if during the
body of the function the stack pointer isn't aligned. I can't think of
any case where GCC would want to use aligned SSE instructions in the
stub, however.

I also realized, I've been compiling um with CONFIG_FRAME_POINTER,
hence the use of rbp. I just also checked how if the bug can be
reproduced without CONFIG_FRAME_POINTER, and yes, the same segfault
loop, and the assembly just bases the pointers off of rsp rather than
rbp, still assuming the 16 byte alignment.

YiFei Zhu