[PATCH] arm64: Apply dynamic shadow call stack patching in two passes

Thu Jan 26 11:07:56 PST 2023

On Tue, Dec 13, 2022 at 6:29 AM Ard Biesheuvel <ardb at kernel.org> wrote:
>
> Due to past bad experiences with the highly complex and overengineered
> DWARF standard that describes the unwind metadata that we are using to
> locate these instructions [..]

Just a note on why I distrust DWARF data so much - it's not so much
because it's complex and overengineered (although I agree it is),
because that's not anything new.

It's because it's almost entirely *untested*.

Compiler code generation bugs are a real issue, and happen
semi-regularly. Are they _common_? No. But they are an issue, and we
were chasing one just a couple of weeks ago.

But code generation bugs are things that get very fundamentally
tested. When the compiler generates bad code, every single user of
that compiler will effectively be testing it.

Yes, we still hit them, often because the kernel does something
unusual (ie the last one was apparently only triggered with a
combination of sanitizer and coverage flags), so "test coverage" isn't
any kind of guarantee, but it's there.

But DWARF debug info? It can be *completely* wrong, and in 99.9% of
all cases nobody will ever notice in any testing. Most of the time
it's not used at all, and even when it is used (whether exception
handling or for actually doing debuggers) it's used only for a tiny
tiny percentage of the whole thing.

So it's not just that we've had bad experiences with it in the past. I
feel that the problem goes deeper than that - the lack of testing
means that it's fundamentally not trustworthy.

Am I exaggerating a bit? Sure. Compilers have (extensive) test-suites
for debug info too. But I do think that coverage tends to be much less
than "everybody relies on it being right" like for normal code
generation.

End result: I would love for us to have some additional security nets
in this area.

Doing the checks as a dry-run phase is good so that any possible
issues hopefully get caught before the code actively rewrites things,
but I'd still be even happier if this was a build-time thing and part
of objtool or something.

That way the dwarf info would also be validated even when it's not
actively used - which is a large point about my "this has seldom been
tested" issue with it.

Because I *think* this dry-run thing is only run of the (few) arm64
cores that actually have PACIASP/AUTIASP. No?

Without test coverage, bugs happen.

                   Linus