[PATCH] arm64: uprobes: Optimize cache flushes for xol slot
Catalin Marinas
catalin.marinas at arm.com
Fri Nov 8 08:49:53 PST 2024
On Thu, 19 Sep 2024 12:17:19 +0000, Liao Chang wrote:
> The profiling of single-thread selftests bench reveals a bottlenect in
> caches_clean_inval_pou() on ARM64. On my local testing machine, this
> function takes approximately 34% of CPU cycles for trig-uprobe-nop and
> trig-uprobe-push.
>
> This patch add a check to avoid unnecessary cache flush when writing
> instruction to the xol slot. If the instruction is same with the
> existing instruction in slot, there is no need to synchronize D/I cache.
> Since xol slot allocation and updates occur on the hot path of uprobe
> handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
> nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
> nop and push testcases.
>
> [...]
Applied to arm64 (for-next/misc), thanks!
[1/1] arm64: uprobes: Optimize cache flushes for xol slot
https://git.kernel.org/arm64/c/bdf94836c22a
--
Catalin
More information about the linux-arm-kernel
mailing list