[PATCHv2 06/11] arm64: entry: move el1 irq/nmi logic to C
Mark Rutland
mark.rutland at arm.com
Thu May 6 03:58:32 PDT 2021
On Thu, May 06, 2021 at 06:25:40PM +0800, He Ying wrote:
>
> 在 2021/5/6 17:16, Mark Rutland 写道:
> > On Thu, May 06, 2021 at 04:28:09PM +0800, He Ying wrote:
> > > Hi Mark,
> > Hi,
> >
> > > I have faced a performance regression for handling IPIs since this commit.
> > >
> > > I caculate the cycles from the entry of el1_irq to the entry of
> > > gic_handle_irq.
> > >
> > > From my test, this commit may overhead an average of 200 cycles. Do you
> > >
> > > have any ideas about this? Looking forward to your reply.
> > On that path, the only meaningfull difference is the call to
> > enter_el1_irq_or_nmi(), since that's now unconditional, and it's an
> > extra layer in the callchain.
> >
> > When either CONFIG_ARM64_PSEUDO_NMI or CONFIG_TRACE_IRQFLAGS are
> > selected, enter_el1_irq_or_nmi() is a wrapper for functions we'd already
> > call, and I'd expectthe cost of the callees to dominate.
> >
> > When neither CONFIG_ARM64_PSEUDO_NMI nor CONFIG_TRACE_IRQFLAGS are
> > selected, this should add a trivial function that immediately returns,
> > and so 200 cycles seems excessive.
> >
> > Building that commit with defconfig, I see that GCC 10.1.0 generates:
> >
> > | ffff800010dfc864 <enter_el1_irq_or_nmi>:
> > | ffff800010dfc864: d503233f paciasp
> > | ffff800010dfc868: d50323bf autiasp
> > | ffff800010dfc86c: d65f03c0 ret
>
> CONFIG_ARM64_PSEUDO_NMI is not set in my test. And I generate a different
> object
>
> from yours:
>
> 00000000000002b8 <enter_el1_irq_or_nmi>:
>
> 2b8: d503233f paciasp
> 2bc: a9bf7bfd stp x29, x30, [sp, #-16]!
> 2c0: 91052000 add x0, x0, #0x148
> 2c4: 910003fd mov x29, sp
> 2c8: 97ffff57 bl 24 <enter_from_kernel_mode.isra.6>
> 2cc: a8c17bfd ldp x29, x30, [sp], #16
> 2d0: d50323bf autiasp
> 2d4: d65f03c0 ret
Which commit are you testing with?
The call to enter_from_kernel_mode() was introduced later in commit:
7cd1ea1010acbede ("rm64: entry: fix non-NMI kernel<->kernel transitions")
... and doesn't exist in commit:
105fc3352077bba5 ("arm64: entry: move el1 irq/nmi logic to C")
Do you see the 200 cycle penalty with 105fc3352077bba5 alone? ... or
only only after the whole series is applied?
If enter_from_kernel_mode() is what's taking the bulk of the cycles,
then this is likely unavoidable work that previously (erroneously)
omitted.
> > ... so perhaps the PACIASP and AUTIASP have an impact?
> I'm not sure...
> >
> > I have a few questions:
> >
> > * Which CPU do you see this on?
> Hisilicon hip05-d02.
> >
> > * Does that CPU implement pointer authentication?
> I'm not sure. How to check?
Does the dmesg contain "Address authentication" anywhere?
> >
> > * What kernel config are you using? e.g. is this seen with defconfig?
>
> Our own. But CONFIG_ARM64_PSEUDO_NMI is not set.
>
> Should I provide it as an attachment?
>From your attachment I see that TRACE_IRQFLAGS and LOCKDEP aren't
selected either, so IIUC the only non-trivial bits in
enter_from_kernel_mode() will be the RCU accounting.
> > * What's the total cycle count from el1_irq to gic_handle_irq?
>
> Applying the patchset: 249 cycles.
>
> Reverting the patchset: 77 cycles.
>
> Maybe 170 cycles is more correct.
>
> >
> > * Does this measurably impact a real workload?
> Have some impact to scheduling perf test.
Does it affect a real workload? i.e. not a microbenchmark?
Thanks,
Mark.
More information about the linux-arm-kernel
mailing list