[PATCH] arm64: uprobes: Optimize cache flushes for xol slot

Catalin Marinas catalin.marinas at arm.com
Fri Nov 8 08:49:53 PST 2024


On Thu, 19 Sep 2024 12:17:19 +0000, Liao Chang wrote:
> The profiling of single-thread selftests bench reveals a bottlenect in
> caches_clean_inval_pou() on ARM64. On my local testing machine, this
> function takes approximately 34% of CPU cycles for trig-uprobe-nop and
> trig-uprobe-push.
> 
> This patch add a check to avoid unnecessary cache flush when writing
> instruction to the xol slot. If the instruction is same with the
> existing instruction in slot, there is no need to synchronize D/I cache.
> Since xol slot allocation and updates occur on the hot path of uprobe
> handling, The upstream kernel running on Kunpeng916 (Hi1616), 4 NUMA
> nodes, 64 cores@ 2.4GHz reveals this optimization has obvious gain for
> nop and push testcases.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: uprobes: Optimize cache flushes for xol slot
      https://git.kernel.org/arm64/c/bdf94836c22a

-- 
Catalin




More information about the linux-arm-kernel mailing list