[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere

Thomas Gleixner tglx at kernel.org
Thu Apr 23 14:03:23 PDT 2026


On Thu, Apr 23 2026 at 10:41, Linus Torvalds wrote:
> If that rule was actually an important part of the ABI, it shouldn't
> have been a debug thing.

It's a debug thing because it's too expensive to be enabled by
default. And it's actually valuable for validating RSEQ critical section
ABI correctness as they can't be single stepped with a debugger as the
break point interruption would immediately canceled.

> So:
>
>  (a) the debug code in question needs to just be removed, since it's
> now actively detrimental, and means that any kernel developer who
> *does* enable it can't actually test this case any more. It's checking
> for something that has been shown to not be true.
>
>  (b) we need to fix this (revert if it can't be fixed otherwise)
>
> I see some patches flying around, but am not clear on whether there
> was an actual patch that make this work again?

There are two issues:

  1) ARM64

     On ARM64 RSEQ got broken completely with the partial move to the
     generic entry code. There are patches flying around which "fix" it
     and Mark is working on a more complete solution as there are other
     subtle issues with that aside of the obvious RSEQ wreckage. The
     latter could have been detected with the existing RSEQ selftests if
     any CI would actually run them on -next.

     That's uninteresting and unrelated to the tcmalloc issue. It's just
     a boring bug which will be fixed in the next couple of days.


  2) The tcmalloc problem

     That's a known problem for at least 6 years. tcmalloc assumes that
     it "owns" rseq and can do whatever it wants with it.

     In 2022 the glibc people requested that tcmalloc becomes
     interoperable with the reasonable expection of glibc to utilize
     rseq as well:

          https://github.com/google/tcmalloc/issues/144

     Status unresolved.

     That means that using tcmalloc requires to tell glibc to _NOT_ use
     rseq and at the same time precludes any other library which wants
     to use it for the documented purposes. So this code sequence blows
     up in your face:

        x = tcmalloc();
        dostuff(x)
          evaluate(rseq::cpu_id_start, rseq::cpu_id)

     because tcmalloc overwrites rseq::cpu_id_start and thereby breaks
     the ABI which evaluate() is rightfully depending on.

     That has absolutely nothing to do with the kernel as there is no
     kernel interaction between tcmalloc's abuse and the subsequent
     evaluation of rseq::cpu_id_start. The kernel has no way to fix that
     problem at all.

     Now back to your generally correct and agreed on "observed
     behaviour" rule.

     Feel free to enforce it, but be aware that you thereby set a
     precedence that a single abuser can then rightfully own a general
     shared interface of the kernel forever and force everybody else to
     give up.

     The tcmalloc developers actually documented that they own the
     world:

     // Note: this makes __rseq_abi.cpu_id_start unusable for its original purpose.

     Do you seriously want to proliferate that?

Thanks,

        tglx






More information about the linux-arm-kernel mailing list