[PATCH 1/6] arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline
Dave Martin
Dave.Martin at arm.com
Tue Jun 23 05:59:56 EDT 2020
On Tue, Jun 23, 2020 at 09:54:31AM +0100, Will Deacon wrote:
> Commit 7e9f5e6629f6 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results
> in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc
> unwinder to unwind out of signal handlers using the .eh_frame information
> populated by our .cfi directives. In conjunction with a4eb355a3fda
> ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has
> been shown to cause segmentation faults originating from within the
> unwinder during thread cancellation:
>
> | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault.
> | 0x0000000000435e24 in uw_frame_state_for ()
> | (gdb) bt
> | #0 0x0000000000435e24 in uw_frame_state_for ()
> | #1 0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 ()
> | #2 0x00000000004374d8 in _Unwind_ForcedUnwind ()
> | #3 0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121
> | #4 0x0000000000429808 in __do_cancel () at ./pthreadP.h:304
> | #5 sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200
> | #6 sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165
> | #7 <signal handler called>
> | #8 futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
>
> After considerable bashing of heads, it appears that our CFI directives
> for unwinding out of the sigreturn trampoline are only processed by libgcc
> when both a .eh_frame_hdr section is present *and* the mysterious NOP is
> covered by an entry in .eh_frame. With both of these now in place, it has
> highlighted that our CFI directives are not comprehensive enough to
> restore the stack pointer of the interrupted context. This results in libgcc
> falling back to an arm64-specific unwinder after computing a bogus PC value
> from the unwind tables. The unwinder promptly dereferences this bogus address
> in an attempt to see if the pointed-to instruction sequence looks like
> the sigreturn trampoline.
>
> Restore the old unwind behaviour, which relied solely on heuristics in
> the unwinder, by removing the .eh_frame_hdr section from the vDSO and
> commenting out the insufficient CFI directives for now. Add comments to
> explain the current, miserable state of affairs.
>
> Cc: Vincenzo Frascino <vincenzo.frascino at arm.com>
> Cc: Tamas Zsoldos <tamas.zsoldos at arm.com>
> Cc: Szabolcs Nagy <szabolcs.nagy at arm.com>
> Cc: Catalin Marinas <catalin.marinas at arm.com>
> Cc: Daniel Kiss <daniel.kiss at arm.com>
> Cc: Dave Martin <dave.martin at arm.com>
> Reported-by: Ard Biesheuvel <ardb at kernel.org>
> Signed-off-by: Will Deacon <will at kernel.org>
> ---
> arch/arm64/kernel/vdso/Makefile | 2 +-
> arch/arm64/kernel/vdso/sigreturn.S | 54 +++++++++++++++++++-----------
> 2 files changed, 35 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
> index 556d424c6f52..4a2e06181d80 100644
> --- a/arch/arm64/kernel/vdso/Makefile
> +++ b/arch/arm64/kernel/vdso/Makefile
> @@ -24,7 +24,7 @@ btildflags-$(CONFIG_ARM64_BTI_KERNEL) += -z force-bti
> # routines, as x86 does (see 6f121e548f83 ("x86, vdso: Reimplement vdso.so
> # preparation in build-time C")).
> ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 --hash-style=sysv \
> - -Bsymbolic --eh-frame-hdr --build-id -n $(btildflags-y) -T
> + -Bsymbolic --build-id -n $(btildflags-y) -T
>
> ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18
> ccflags-y += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/arm64/kernel/vdso/sigreturn.S b/arch/arm64/kernel/vdso/sigreturn.S
> index 620a3ef837b7..0e18729abc3b 100644
> --- a/arch/arm64/kernel/vdso/sigreturn.S
> +++ b/arch/arm64/kernel/vdso/sigreturn.S
> @@ -18,29 +18,40 @@
>
> .text
>
> +/*
> + * NOTE!!! You may notice that all of the .cfi directives in this file have
> + * been commented out. This is because they have been shown to trigger segfaults
> + * in libgcc when unwinding out of a SIGCANCEL handler to invoke pthread
> + * cleanup handlers during the thread cancellation dance. By omitting the
> + * directives, we trigger an arm64-specific fallback path in the unwinder which
> + * recognises the signal frame and restores many of the registers directly from
> + * the sigcontext. Re-enabling the cfi directives here therefore needs to be
> + * much more comprehensive to reduce the risk of further regressions.
> + */
> +
> /* Ensure that the mysterious NOP can be associated with a function. */
> - .cfi_startproc
> +// .cfi_startproc
>
> /*
> - * .cfi_signal_frame causes the corresponding Frame Description Entry in the
> - * .eh_frame section to be annotated as a signal frame. This allows DWARF
> - * unwinders (e.g. libstdc++) to implement _Unwind_GetIPInfo(), which permits
> - * unwinding out of the signal trampoline without the need for the mysterious
> - * NOP.
> + * .cfi_signal_frame causes the corresponding Frame Description Entry (FDE) in
> + * the .eh_frame section to be annotated as a signal frame. This allows DWARF
> + * unwinders (e.g. libstdc++) to implement _Unwind_GetIPInfo() and identify
> + * the next frame using the unmodified return address instead of subtracting 1,
> + * which may yield the wrong FDE.
> */
> - .cfi_signal_frame
> +// .cfi_signal_frame
>
> /*
> * Tell the unwinder where to locate the frame record linking back to the
> - * interrupted context. We don't provide unwind info for registers other
> - * than the frame pointer and the link register here; in practice, this
> - * is sufficient for unwinding in C/C++ based runtimes and the values in
> - * the sigcontext may have been modified by this point anyway. Debuggers
> + * interrupted context. We don't provide unwind info for registers other than
> + * the frame pointer and the link register here; in practice, this is likely to
> + * be insufficient for unwinding in C/C++ based runtimes, especially without a
> + * means to restore the stack pointer. Thankfully, unwinders and debuggers
> * already have baked-in strategies for attempting to unwind out of signals.
> */
> - .cfi_def_cfa x29, 0
> - .cfi_offset x29, 0 * 8
> - .cfi_offset x30, 1 * 8
> +// .cfi_def_cfa x29, 0
> +// .cfi_offset x29, 0 * 8
> +// .cfi_offset x30, 1 * 8
I guess this will help is to demonstrate that nobody is relying on
detailed annotations here...
> /*
> * This mysterious NOP is required for some unwinders (e.g. libc++) that
> @@ -51,16 +62,19 @@
> nop // Mysterious NOP
>
> /*
> - * GDB relies on being able to identify the sigreturn instruction sequence to
> - * unwind from signal handlers. We cannot, therefore, use SYM_FUNC_START()
> - * here, as it will emit a BTI C instruction and break the unwinder. Thankfully,
> - * this function is only ever called from a RET and so omitting the landing pad
> - * is perfectly fine.
> + * GDB, libgcc and libunwind rely on being able to identify the sigreturn
> + * instruction sequence to unwind from signal handlers. We cannot, therefore,
> + * use SYM_FUNC_START() here, as it will emit a BTI C instruction and break the
> + * unwinder. Thankfully, this function is only ever called from a RET and so
> + * omitting the landing pad is perfectly fine.
> */
> SYM_CODE_START(__kernel_rt_sigreturn)
> +// PLEASE DO NOT MODIFY
> mov x8, #__NR_rt_sigreturn
> +// PLEASE DO NOT MODIFY
> svc #0
> - .cfi_endproc
> +// PLEASE DO NOT MODIFY
> +// .cfi_endproc
> SYM_CODE_END(__kernel_rt_sigreturn)
FWIW,
Acked-by: Dave Martin <Dave.Martin at arm.com>
More information about the linux-arm-kernel
mailing list