[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Thomas Gleixner
tglx at kernel.org
Thu Apr 23 14:03:23 PDT 2026
On Thu, Apr 23 2026 at 10:41, Linus Torvalds wrote:
> If that rule was actually an important part of the ABI, it shouldn't
> have been a debug thing.
It's a debug thing because it's too expensive to be enabled by
default. And it's actually valuable for validating RSEQ critical section
ABI correctness as they can't be single stepped with a debugger as the
break point interruption would immediately canceled.
> So:
>
> (a) the debug code in question needs to just be removed, since it's
> now actively detrimental, and means that any kernel developer who
> *does* enable it can't actually test this case any more. It's checking
> for something that has been shown to not be true.
>
> (b) we need to fix this (revert if it can't be fixed otherwise)
>
> I see some patches flying around, but am not clear on whether there
> was an actual patch that make this work again?
There are two issues:
1) ARM64
On ARM64 RSEQ got broken completely with the partial move to the
generic entry code. There are patches flying around which "fix" it
and Mark is working on a more complete solution as there are other
subtle issues with that aside of the obvious RSEQ wreckage. The
latter could have been detected with the existing RSEQ selftests if
any CI would actually run them on -next.
That's uninteresting and unrelated to the tcmalloc issue. It's just
a boring bug which will be fixed in the next couple of days.
2) The tcmalloc problem
That's a known problem for at least 6 years. tcmalloc assumes that
it "owns" rseq and can do whatever it wants with it.
In 2022 the glibc people requested that tcmalloc becomes
interoperable with the reasonable expection of glibc to utilize
rseq as well:
https://github.com/google/tcmalloc/issues/144
Status unresolved.
That means that using tcmalloc requires to tell glibc to _NOT_ use
rseq and at the same time precludes any other library which wants
to use it for the documented purposes. So this code sequence blows
up in your face:
x = tcmalloc();
dostuff(x)
evaluate(rseq::cpu_id_start, rseq::cpu_id)
because tcmalloc overwrites rseq::cpu_id_start and thereby breaks
the ABI which evaluate() is rightfully depending on.
That has absolutely nothing to do with the kernel as there is no
kernel interaction between tcmalloc's abuse and the subsequent
evaluation of rseq::cpu_id_start. The kernel has no way to fix that
problem at all.
Now back to your generally correct and agreed on "observed
behaviour" rule.
Feel free to enforce it, but be aware that you thereby set a
precedence that a single abuser can then rightfully own a general
shared interface of the kernel forever and force everybody else to
give up.
The tcmalloc developers actually documented that they own the
world:
// Note: this makes __rseq_abi.cpu_id_start unusable for its original purpose.
Do you seriously want to proliferate that?
Thanks,
tglx
More information about the linux-arm-kernel
mailing list