[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere

Wed Apr 22 06:09:09 PDT 2026

Hi Mathias,

On Wed, Apr 22, 2026 at 11:50:26AM +0200, Mathias Stearn wrote:
> TL;DR: As of 6.19, rseq no longer provides the documented atomicity
> guarantees on arm64 by failing to abort the critical section on same-core
> preemption/resumption. Additionally, it breaks tcmalloc specifically by
> failing to overwrite the cpu_id_start field at points where it was relied
> on for correctness.

Thanks for the report, and the test case.

As a holding reply, I'm looking into this now from the arm64 side.

I'll leave it to Thomas/Peter/Mathieu to comment w.r.t. the issue you
raise with cpu_id_start.

For some reason, this mail didn't make it to my inbox, and I had to grab
it from lore using b4. That might be a problem with my local mail
server; I'm just noting that in case others also didn't receive this.

Mark.

> This is a SEVERE breakage for MongoDB. We received several user reports of
> crashes on 6.19. I made a stress test that showed that 6.19 can cause
> malloc to return the same pointer twice without it being freed. Because
> that can cause arbitrary corruption, our latest releases have all been
> patched to refuse to start at all on 6.19+.
> 
> TCMalloc uses rseq in a "creative" way described at
> https://github.com/google/tcmalloc/blob/master/docs/rseq.md. In particular,
> the "Current CPU Slabs Pointer Caching" section describes an optimization
> that relies on an undocumented fact that the kernel was always overwriting
> cpu_id_start (even when it wouldn't change) to invalidate a user-space
> cache. Since the change to stop writing cpu_id_start seemed to be
> intentional as part of a refactoring merged in 2b09f480f0a1, I started
> working on a userspace patch to stop relying on that. Unfortunately when
> that was complete I ran into a wall that is impossible to work around from
> userspace.
> 
> On arm64, the kernel no longer meets the documented guarantee that rseq
> critical sections are atomic with respect to preemption. It seems to only
> abort the critical section when the thread is migrated to a different core.
> The attached test proves it and passes on x86 both before and after 6.19,
> and on arm before 6.19, but fails on arm with 6.19. It pins the process to
> a single core and then has an rseq critical section that observes a change
> made by another thread which is supposed to be impossible. I think this
> will break basically any real usage of rseq, other than just reading the
> current cpu_id.
> 
> An LLM pointed to these two specific commits in the refactor as causing
> this (oldest first):
> - 39a167560a61 rseq: Optimize event setting
> This assumed that user_irq would be set on preemption but it wasn't on
> arm64, so TIF_NOTIFY_RESUME isn't raised on same cpu preemption.
> - 566d8015f7ee rseq: Avoid CPU/MM CID updates when no event pending
> This broke TCMalloc slab caching trick by not overwriting cpu_id_start on
> every return to userspace
> 
> (I have a lot more analysis and suggested fixes from LLMs since I used them
> heavily in this testing and analysis, but I won't spam you with the slop
> unless requested)
> 
> The arm64 change is a clear breakage and I'm sure it will be
> uncontroversial to fix. I can imagine more resistance to reverting to the
> old behavior of always overwriting the cpu_id_start field since that seems
> to have been an intentional optimization choice. I have reached out to the
> TCMalloc maintainers (CC'd) and believe there is a solution that gets the
> vast majority of the optimization while still preserving the behavior that
> TCMalloc currently relies on[1].
> 
> Any time a critical section might be aborted (migration, preemption, signal
> delivery, and membarrier IPI), the kernel already must (but doesn't on
> arm64 at the moment) check the rseq_cs field to see if the thread is in a
> critical section, and is documented as nulling the pointer after (I assume
> to make later checks cheaper). It would be sufficient for tcmalloc's
> internal usage if every time the kernel nulled out rseq_cs, it also wrote
> the cpu id to cpu_id_start. That should be essentially free since you are
> already writing to the same cache line. It was pointed out that that could
> be an issue if another rseq user in the same thread nulled rseq_cs after
> its critical section, which would require the kernel to update cpu_id_start
> each time it checks rseq_cs, regardless of whether it nulls it. We aren't
> aware of any processes that mix tcmalloc with other rseq usages that null
> out the field from userspace, but we can't rule them out since it is open
> source. Either way, this preserves the property of not updating
> cpu_id_start on every syscall return and non-membarrier interrupts, which I
> assume is where the majority of the optimization win was from.
> 
> All testing of problematic versions was performed on x86_64 and
> aarch64 Ubuntu 24.04.4 with the kernel manually upgraded to
> 6.19.8-061908-generic. Source analysis was performed on the v6.19 tag. I
> had a few AI agents confirm that nothing in the relevant changes to master
> should have solved this, but I have not yet tested there.
> 
> $ cat /proc/version
> Linux version 6.19.8-061908-generic (kernel at balboa)
> (aarch64-linux-gnu-gcc-15 (Ubuntu 15.2.0-15ubuntu1) 15.2.0, GNU ld (GNU
> Binutils for Ubuntu) 2.46) #202603131837 SMP PREEMPT_DYNAMIC Sat Mar 14
> 00:00:07 UTC 2026
> 
> [1]  There is also an exploration of some options to make tcmalloc not rely
> on the cpu_id_start overwriting. However we would strongly prefer that
> existing binaries continue to work on 6.19 kernels, even if newer binaries
> don't need that. At least for a good while.