[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere
Chris Kennelly
ckennelly at google.com
Thu Apr 23 10:38:20 PDT 2026
On Thu, Apr 23, 2026 at 1:19 PM Thomas Gleixner <tglx at kernel.org> wrote:
>
> On Thu, Apr 23 2026 at 14:11, Mathias Stearn wrote:
>
> Cc+ Linus
>
> > Of course, even if we make that change, it will only apply to _future_
> > binaries. That's why we prefer a kernel fix so that users will be able
> > to run our existing releases (or any containers that use them) on a
> > modern kernel.
>
> I understand that and as everyone else I would be happy to do that, but
> the price everyone pays for proliferating the tcmalloc insanity is not
> cheap either.
>
> So let me recap the whole situation and how we got there:
>
> 1) The original RSEQ implementation updates the rseq::cpu_id_start
> field in user space more or less unconditionally on every exit to
> user, whether the CPU/MMCID have been changed or not.
>
> That went unnoticed for years because nothing used rseq aside of
> google and tcmalloc. Once glibc registered rseq, this resulted in a
> up to 15% performance penalty for syscall heavy workloads.
>
> 2) The rseq::cpu_id_start field is documented as read only for user
> space in the ABI contract and guaranteed to be updated by the
> kernel when a task is migrated to a different CPU.
>
> 3) The RO for userspace property has been enforced by RSEQ debugging
> mode since day one. If such a debug enabled kernel detects user
> space changing the field it kills the task/application.
The optimization in TCMalloc that you're describing has been available
since September 2023:
https://github.com/google/tcmalloc/commit/aaa4fbf6fcdce1b7f86fcadd659874645c75ddb9
I thought the RSEQ debug checks were added in December 2024:
https://github.com/torvalds/linux/commit/7d5265ffcd8b41da5e09066360540d6e0716e9cd,
but perhaps I misidentified the ones in question.
>
> 4) tcmalloc abused the suboptimal implementation (see #1) and
> scribbled over rseq::cpu_id_start for their own nefarious purposes.
>
> 5) As a consequence of #4 tcmalloc cannot be used on a RSEQ debug
> enabled kernel. Which means a developer cannot validate his RSEQ
> code against a debug kernel when tcmalloc is in use on the system
> as that would crash the tcmalloc dependent applications due to #3.
>
> 6) As a consequence of #4 tcmalloc cannot be used together with any
> other facility/library which wants to utilize the ABI guaranteed
> properties of rseq::cpu_id_start in the same application.
>
> 7) tcmalloc violates the ABI from day one and has since refused to
> address the problem despite being offered a kernel side rseq
> extension to solve it many years ago.
I know there was some discussion around a preemption notification
scheme, rseq_sched_state; but I thought the discussion moved in favor
of the timeslice extension interface that recently landed. Timeslice
extension solves some use cases, but I'm not sure it addresses this
one.
>
> 8) When addressing the performance issues of RSEQ the unconditional
> update stopped to exist under the valid assumption that the kernel
> has only to satisfy the guaranteed ABI properties, especially when
> they are enforcable by RSEQ debug.
>
> As a consequence this exposed the tcmalloc ABI violation because
> the unconditional pointless overwriting of something which did not
> change stopped to happen.
>
> Due to #4 everyone is in a hard place and up a creek without a paddle.
>
> Here are the possible solutions:
>
> A) Mathias suggested to force overwrite rseq:cpu_id_start everytime
> the rseq::rseq_cs field is cleared by the kernel under the not yet
> validated theoretical assumption that this cures the problem for
> tcmalloc.
>
> If that's sufficient that would be harmless performance wise
> because the write would be inside the already existing STAC/CLAC
> section and just add some more noise to the rseq critical section
> operations.
>
> That would allow existing tcmalloc usage to continue, but
> obviously would neither solve #5 and #6 above nor provide an
> incentive for tcmalloc to actually fix their crap.
>
> B) If that's not sufficient then keeping tcmalloc alive would require
> to go back to the previous state and let everyone else pay the
> price in terms of performance overhead.
>
> C) Declare that this is not a regression because the ABI guarantee is
> not violated and the RO property has been enforcable by RSEQ
> debugging since day one.
>
> In my opinion #C is the right thing to do, but I can see a case being
> made for the lightweight fix Mathias suggested (#A) _if_ and only _if_
> that is sufficient. Picking #A would also mean that user space people
> have to take up the fight against tcmalloc when they want to use the
> RSEQ guaranteed ABI along with tcmalloc in the same application or use a
> RSEQ debug kernel to validate their own code.
>
> Going back to the full unconditional nightmare (#B) is not an option at
> all as anybody else has to take the massive performance hit.
>
> Oh well...
>
> Thanks,
>
> tglx
More information about the linux-arm-kernel
mailing list