[PATCH 0/6] Fix unwinding through sigreturn trampolines

Will Deacon will at kernel.org
Tue Jun 23 10:34:43 EDT 2020


On Tue, Jun 23, 2020 at 02:56:14PM +0100, Szabolcs Nagy wrote:
> The 06/23/2020 14:20, Will Deacon wrote:
> > On Tue, Jun 23, 2020 at 12:11:09PM +0100, Szabolcs Nagy wrote:
> > > as for thread cancellation in glibc: it uses exception
> > > mechanism for cleanups, but the default cancel state
> > > is PTHREAD_CANCEL_DEFERRED which means only blocking
> > > libc calls throw (so -fexceptions is enough and the
> > > libgcc logic is fine), if you switch to
> > > PTHREAD_CANCEL_ASYNCHRONOUS then there may be a problem
> > > but you can only do pure computations in that state,
> > > (only 3 libc functions are defined to be async cancel
> > > safe), i think you cannot register cleanup handlers
> > > that run on the same stack frame that may be async
> > > interrupted.
> > 
> > Ah, I was trying to print a message, so I suppose that's out. Even so,
> > debugging with gdb and putting a breakpoint on the callback showed that
> > it wasn't getting invoked.
> > 
> > My code is below just as an FYI, since being able to derive a test from
> > this would be useful should we try to fix the CFI directives in future.
> > 
> > I get different results based on different combinations of
> > architecture, toolchain and optimisation level.
> 
> with -fexceptions gcc only emits the cleanup begin/end
> labels around function calls, i.e. it only expects a throw
> from functions (the cleanup handler is called if the pc is
> between the begin/end labels during unwind), if an
> instruction is interrupted and you throw from there then
> cleanup may work if the instruction happens to be in the
> range covered by the begin/end labels, but gcc does not
> try to make that happen.

Interesting. That's not mentioned anywhere in the pthread_cleanup_push
man page!

> with -fnon-call-exceptions i think the test is supposed
> to work and here it works, i get:
> 
> Cleanup handler called 0x2
> Cleanup handler called 0x1

Thanks, that's much better. I could even replace baz with some out-of-line
assembly:

	.text
	.align	2
	.globl	baz
	.cfi_startproc
baz:
	mov	x0, x29
	.cfi_register x29, x0
	mov	x1, x30
	.cfi_register x30, x1
	mov	x29, #42
	mov	x30, #42
	b	.
	ret
	.cfi_endproc

and I can see the unwinder segfaulting if I remove the CFI directives.

> i think posix does not allow pthread_cleanup_push in
> async cancel state (but you can change the cancel
> state before and after it, which is valid i think),

That makes sense to avoid racing with a signal when installing the things,
I suppose.

> i think printf is valid in the cleanup handler:
> the cancel state is reset (and cancellation is disabled)
> when libc acts on cancellation. (and if the interrupted
> code was async cancel safe it should work).
> 
> (that said i've seen issues with -fnon-call-exceptions
> so i consider the musl cancellation design more robust:
> just add the cleanup ptr to a libc internal list that
> is called on cancellation, no unwinding is involved.
> this does not work with c++ dtors though, but c++ never
> defined dtor vs posix cancellation semantics so
> cancelling c++ code is just undefined.)

Yes, that makes a tonne more sense to me. I couldn't figure out why
unwinding was necessary at all for this, since the context for the longjmp
is stored in TLS *anyway* afaiu.

Will



More information about the linux-arm-kernel mailing list