Withdraw [PATCH] tracing: Enable kprobe tracing for Arm64 asm functions
Mark Rutland
mark.rutland at arm.com
Fri Dec 12 08:33:06 PST 2025
On Wed, Dec 10, 2025 at 12:16:17PM -0800, Ben Niu wrote:
> On Mon, Nov 17, 2025 at 10:34:22AM +0000, Mark Rutland wrote:
> > On Thu, Oct 30, 2025 at 11:07:51AM -0700, Ben Niu wrote:
> > > On Thu, Oct 30, 2025 at 12:35:25PM +0000, Mark Rutland wrote:
> > > > Is there something specific you want to trace, but cannot currently
> > > > trace (on arm64)?
> > >
> > > For some reason, we only saw Arm64 Linux asm functions __arch_copy_to_user and
> > > __arch_copy_from_user being hot in our workloads, not those counterpart asm
> > > functions on x86, so we are trying to understand and improve performance of
> > > those Arm64 asm functions.
> >
> > Are you sure that's not an artifact of those being out-of-line on arm64,
> > but inline on x86? On x86, the out-of-line forms are only used when the
> > CPU doesn't have FSRM, and when the CPU *does* have FSRM, the logic gets
> > inlined. See raw_copy_from_user(), raw_copy_to_user(), and
> > copy_user_generic() in arch/x86/include/asm/uaccess_64.h.
>
> On x86, INLINE_COPY_TO_USER is not defined in the latest linux kernel and our
> internal branch, so _copy_to_user is always defined as an extern function
> (no-inline), which ends up inlining copy_user_generic. copy_user_generic
> executes FSRM rep movs if CPU supports it (our case), otherwise, it calls
> rep_movs_alternative, which issues plain movs to copy memory.
> > Have you checked that inlining is not skewing your results, and
> > artificially making those look hotter on am64 by virtue of centralizing
> > samples to the same IP/PC range?
>
> As mentioned above, _copy_to_user is not inlined on x86.
Thanks for confirming!
> > Can you share any information on those workloads? e.g. which callchains
> > were hot?
>
> Please reach out to James Greenhalgh and Chris Goodyer at Arm for more details
> about those workloads, which I can't share in a public channel.
If you can't share this info publicly, that's fair enough.
Please note that upstream it's hard to justify changing things based on
confidential information. Sharing information with me in private isn't
all that helpful as it would not be clear what I could subsequently
share in public.
The reason that I've asked is that it would be very interesting to know
whether there's a specific subsystem, driver, or code path that's
hitting this hard, because a better option might be "don't do that", and
attempt to avoid the uaccesses entirely (e.g. accessing a kernel alias
with get_user_pages()).
If there's anything that you can share about that, it'd be very helpful.
Mark.
More information about the linux-arm-kernel
mailing list