[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere

Chris Kennelly ckennelly at google.com
Thu Apr 23 10:38:20 PDT 2026


On Thu, Apr 23, 2026 at 1:19 PM Thomas Gleixner <tglx at kernel.org> wrote:
>
> On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:
>
> Cc+ Linus
>
> > Of course, even if we make that change, it will only apply to _future_
> > binaries. That's why we prefer a kernel fix so that users will be able
> > to run our existing releases (or any containers that use them) on a
> > modern kernel.
>
> I understand that and as everyone else I would be happy to do that, but
> the price everyone pays for proliferating the tcmalloc insanity is not
> cheap either.
>
> So let me recap the whole situation and how we got there:
>
>   1) The original RSEQ implementation updates the rseq::cpu_id_start
>      field in user space more or less unconditionally on every exit to
>      user, whether the CPU/MMCID have been changed or not.
>
>      That went unnoticed for years because nothing used rseq aside of
>      google and tcmalloc. Once glibc registered rseq, this resulted in a
>      up to 15% performance penalty for syscall heavy workloads.
>
>   2) The rseq::cpu_id_start field is documented as read only for user
>      space in the ABI contract and guaranteed to be updated by the
>      kernel when a task is migrated to a different CPU.
>
>   3) The RO for userspace property has been enforced by RSEQ debugging
>      mode since day one. If such a debug enabled kernel detects user
>      space changing the field it kills the task/application.

The optimization in TCMalloc that you're describing has been available
since September 2023:
https://github.com/google/tcmalloc/commit/aaa4fbf6fcdce1b7f86fcadd659874645c75ddb9

I thought the RSEQ debug checks were added in December 2024:
https://github.com/torvalds/linux/commit/7d5265ffcd8b41da5e09066360540d6e0716e9cd,
but perhaps I misidentified the ones in question.

>
>   4) tcmalloc abused the suboptimal implementation (see #1) and
>      scribbled over rseq::cpu_id_start for their own nefarious purposes.
>
>   5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
>      enabled kernel. Which means a developer cannot validate his RSEQ
>      code against a debug kernel when tcmalloc is in use on the system
>      as that would crash the tcmalloc dependent applications due to #3.
>
>   6) As a consequence of #4 tcmalloc cannot be used together with any
>      other facility/library which wants to utilize the ABI guaranteed
>      properties of rseq::cpu_id_start in the same application.
>
>   7) tcmalloc violates the ABI from day one and has since refused to
>      address the problem despite being offered a kernel side rseq
>      extension to solve it many years ago.

I know there was some discussion around a preemption notification
scheme, rseq_sched_state; but I thought the discussion moved in favor
of the timeslice extension interface that recently landed. Timeslice
extension solves some use cases, but I'm not sure it addresses this
one.

>
>   8) When addressing the performance issues of RSEQ the unconditional
>      update stopped to exist under the valid assumption that the kernel
>      has only to satisfy the guaranteed ABI properties, especially when
>      they are enforcable by RSEQ debug.
>
>      As a consequence this exposed the tcmalloc ABI violation because
>      the unconditional pointless overwriting of something which did not
>      change stopped to happen.
>
> Due to #4 everyone is in a hard place and up a creek without a paddle.
>
> Here are the possible solutions:
>
>   A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
>      the rseq::rseq_cs field is cleared by the kernel under the not yet
>      validated theoretical assumption that this cures the problem for
>      tcmalloc.
>
>      If that's sufficient that would be harmless performance wise
>      because the write would be inside the already existing STAC/CLAC
>      section and just add some more noise to the rseq critical section
>      operations.
>
>      That would allow existing tcmalloc usage to continue, but
>      obviously would neither solve #5 and #6 above nor provide an
>      incentive for tcmalloc to actually fix their crap.
>
>   B) If that's not sufficient then keeping tcmalloc alive would require
>      to go back to the previous state and let everyone else pay the
>      price in terms of performance overhead.
>
>   C) Declare that this is not a regression because the ABI guarantee is
>      not violated and the RO property has been enforcable by RSEQ
>      debugging since day one.
>
> In my opinion #C is the right thing to do, but I can see a case being
> made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
> that is sufficient. Picking #A would also mean that user space people
> have to take up the fight against tcmalloc when they want to use the
> RSEQ guaranteed ABI along with tcmalloc in the same application or use a
> RSEQ debug kernel to validate their own code.
>
> Going back to the full unconditional nightmare (#B) is not an option at
> all as anybody else has to take the massive performance hit.
>
> Oh well...
>
> Thanks,
>
>         tglx



More information about the linux-arm-kernel mailing list