[REGRESSION] rseq: refactoring in v6.19 broke everyone on arm64 and tcmalloc everywhere

Peter Zijlstra peterz at infradead.org
Tue Apr 28 01:03:59 PDT 2026


On Mon, Apr 27, 2026 at 12:04:48AM +0200, Thomas Gleixner wrote:

> +Optimized RSEQ V2
> +-----------------
> +
> +On architectures which utilize the generic entry code and generic TIF bits
> +the kernel supports runtime optimizations for RSEQ, which also enable
> +enhanced features like scheduler time slice extensions.
> +
> +To enable them a task has to register the RSEQ region with at least the
> +length advertised by getauxval(AT_RSEQ_FEATURE_SIZE).
> +
> +If existing binaries register with RSEQ_ORIG_SIZE (32 bytes), the kernel
> +keeps the legacy low performance mode enabled to fulfil the expectations
> +existing users regarding the original RSEQ implementation behaviour.
> +
> +The following table documents the ABI and behavioral guarantees of the
> +legacy and the optimized V2 mode.
> +
> +.. list-table:: RSEQ modes
> +   :header-rows: 1
> +
> +   * - Nr
> +     - What
> +     - Legacy
> +     - Optimized V2
> +   * - 1
> +     - The cpu_id_start, cpu_id, node_id and mm_cid fields (User mode read
> +       only)
> +     - Updated by the kernel unconditionally after each context switch and
> +       before signal delivery
> +     - Updated by the kernel if and only if they change, i.e. if the task
> +       is migrated or mm_cid changes
> +   * - 2
> +     - The rseq_cs critical section field
> +     - Evaluated and handled unconditionally after each context switch and
> +       before signal delivery
> +     - Evaluated and handled conditionally only when user space was
> +       interrupted. Either after being preempted or before signal delivery
> +       in the interrupted context.
> +   * - 3
> +     - Read only fields
> +     - No strict enforcement except in debug mode
> +     - Strict enforcement
> +   * - 4
> +     - membarrier(...RSEQ)
> +     - All running threads of the process are interrupted and the ID fields
> +       are rewritten and eventually active critical sections are aborted
> +       before they return to user space.  All threads which are scheduled
> +       out whether voluntary or not are covered by #1/#2 above.
> +     - All running threads of the process are interrupted and eventually
> +       active critical sections are aborted before these threads return to
> +       user space. The ID fields are only updated if changed as a
> +       consequence of the interrupt. All threads which are scheduled out
> +       whether voluntary not are covered by #1/#2 above.
> +   * - 5
> +     - Time slice extensions
> +     - Not supported
> +     - Supported

I'm sure its cute when rendered, but when read as text this is nigh on
unreadable.

> +The legacy mode is obviously less performant as it does unconditional
> +updates and critical section checks even if not strictly required by the
> +ABI contract. That can't be changed anymore as some users depend on that
> +observed behavior, which in turn enables them to violate the ABI and
> +overwrite the cpu_id_start field for their own purposes. This is obviously
> +discouraged as it renders RSEQ incompatible with the intended usage and
> +breaks the expectation of other libraries in the same application.
> +
> +The ABI compliant optimized mode, which respects the read only fields, does
> +not require unconditional updates and therefore is way more performant. The
> +kernel validates the read only fields for compliance. If user space
> +modifies them, the process is killed. Compliant usage allows multiple
> +libraries in the same application to benefit from the RSEQ functionality
> +without disturbing each other.
> +



More information about the linux-arm-kernel mailing list