[PATCH 0/6] Fix unwinding through sigreturn trampolines

Tue Jun 23 07:08:52 EDT 2020

On Tue, 23 Jun 2020 at 12:58, Will Deacon <will at kernel.org> wrote:
>
> On Tue, Jun 23, 2020 at 12:11:24PM +0200, Ard Biesheuvel wrote:
> > On Tue, 23 Jun 2020 at 11:54, Dave Martin <Dave.Martin at arm.com> wrote:
> > > In the longer term, we should perhaps keep .cfi_signal_frame, and ditch
> > > all the other annotations.  If the unwinder knows for sure it's the
> > > signal frame (and understands this annotation), it can use its own
> > > knowledge to do the right thing.  Otherwise, it can fall back on its
> > > other heuristics.
> >
> > What we found out is that libunwind unconditionally uses its own
> > unwind strategy when it spots the mov/svc instruction pair at the
> > return address, and so what we put here does not matter at all.
> > The libgcc case is different, as it only applies the heuristics if it
> > doesn't find any unwind info covering the return address. So this is
> > what bit us here: by fixing the way the CFI directives are emitted,
> > libgcc no longer uses its own baked in unwind strategy for the signal
> > frame, but interprets the one we provide, which results in a bogus
> > address for the /next/ frame. Since that is not covered by unwind info
> > either, libgcc attempts to apply the heuristics at this level, and
> > explodes because the bogus address cannot be dereferenced to get at
> > the opcodes.
> >
> > So libgcc doesn't even check whether it looks like a signal frame
> > unless the unwind info is missing.
> >
> > > This is probably better than having rich annotations that are
> > > pedantically correct, but only half-supported by unwinders out in the
> > > wild.
> > >
> >
> > Agreed. But without providing any annotations, other unwinders will
> > fail to unwind through the sigreturn trampoline, which is why the
> > change was made in the first place.
> >
> > So the safest way to fix this (imho) is to add CFI directives that
> > mirror what libunwind does unconditionally, and libgcc when it spots
> > the mov/svc pair on unannotated addresses, which is to restore all the
> > registers from the sigframe, and to use regs->pc as the return address
> > (which is where the unwind should proceed, rather than at whatever
> > value happened to reside in x30 at the point the signal was taken)
> >
> > The only hairy bit is the FP/SIMD registers, which are ignored by
> > libunwind, and only half-restored by libgcc (which appears to assume
> > that only d8..d15 need to be restored). This means throwing a C++
> > exception in a signal handler and catching it outside of it is going
> > to be difficult to get 100% correct, but this is something that
> > appears to be broken already everywhere. (Unless there are some
> > AAPCS/ABI rules that specify that FP/SIMD registers have to be assumed
> > clobbered when catching an exception)
>
> I think Dave makes an interesting point about .cfi_signal_frame, though.
> With any released arm64 kernel, unwinding here requires heuristics to
> identify the signal frame. The usual approach seems to pattern match the
> instruction stream (and I'm not convinced libunwind works on big-endian
> here), but you could also have looked up pc+4 to see if the function was
> marked with .cfi_signal_frame. You can't do that anymore after this patch.

Good point.

> We have no evidence anybody bothered with that (it seems like more trouble
> than it's worth), but it's an interesting observation. Furthermore, maybe
> the right answer *is* to punt this all to the unwinder given that we're
> already forced to keep the instruction sequence exactly as it is. I suspect
> the LLVM unwinder has to use heuristics for most other architectures
> anyway, so it should already have whatever hooks it needs.
>

The unwinder is definitely better equipped to deal with this, given
how difficult it is to come up with CFI directives at compile time
that apply universally to any kind of sigframe we may instantiate at
runtime.

> It would be a different story if most unwinders were expecting to use
> CFI and it generally worked well on other architectures, but that really
> doesn't appear to be the case.
>

If fixing LLVM unwinding is no longer a requirement, reverting to what
we had is probably the best, although it is rather unfortunate that we
will be encouraging the LLVM developers to put another bodge in user
space. Could we at least encourage them to take the libgcc approach,
where the builtin strategy is only used as a fallback if the return
address is not covered by unwind info? Otherwise, we are painting
ourselves into a corner even more than we have already with respect to
our ability to ever start doing this properly in the future.