[PATCH] arm64: clear_user: align __arch_clear_user() to 128B for I-cache efficiency
Will Deacon
will at kernel.org
Mon Nov 24 05:38:25 PST 2025
On Fri, Nov 21, 2025 at 12:04:55AM -0500, Luke Yang wrote:
> On aarch64 kernels, recent changes (specifically irqbypass patch
> https://lore.kernel.org/all/20250516230734.2564775-6-seanjc@google.com/)
> shifted __arch_clear_user() such that the tight zeroing loop straddles
> I-cache lines. This causes measurable read performance regression when
> reading from /dev/zero.
>
> Add `.p2align 6` (64-byte alignment) to guarantee the loop stays within a
> single I-cache boundary, restoring the previous IPC and throughput.
Hmm, but what's special about __arch_clear_user()? If we make this change,
anybody could surely make similar arguments for other functions on their
hot paths?
Amusingly, there's CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B to play around
with that and it sounds like the irqbyass change you cite calls into
the category of changes highlighted by the Kconfig text.
Will
More information about the linux-arm-kernel
mailing list