[PATCH v2 1/2] arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
Reiji Watanabe
reijiw at google.com
Thu Nov 18 00:18:46 PST 2021
On Tue, Nov 16, 2021 at 3:00 PM Robin Murphy <robin.murphy at arm.com> wrote:
>
> On 2021-11-08 07:11, Reiji Watanabe wrote:
> > Currently, clear_page() uses DC ZVA instruction unconditionally. But it
> > should make sure that DCZID_EL0.DZP, which indicates whether or not use
> > of DC ZVA instruction is prohibited, is zero when using the instruction.
> > Use STP as memset does instead when DCZID_EL0.DZP == 1.
>
> Reviewed-by: Robin Murphy <robin.murphy at arm.com>
>
> FWIW I did eventually figure out the "pre-bias" trick from v1 thanks to
> Mark's nod toward the original context, but a quick survey of various
> optimisation guides implied that the explicit add should generally be
> preferred over post-index writeback anyway, so I think we're all good here.
Thank you for the review!
The original code, which used *pre*-index (not post-index) addressing,
made no significant difference in page_clear performance on my test
environment from the current code.
Now, I am looking at creating v3 patches to use stnp instead of stp
in page_clear (NOTE: DC ZVA shows much better performance on my test
system than stp/stnp).
Although using stnp didn't show significant difference in clear_page()
performance on my test system from stp (no significant difference in
cache-misses, cache_refill, cache_wb, or cache_allocate event counter
either), using stnp should be more appropriate for page_clear than stp,
and I understand it could show better performance on some CPUs.
Thanks,
Reiji
>
> Cheers,
> Robin.
>
> > Fixes: f27bb139c387 ("arm64: Miscellaneous library functions")
> > Signed-off-by: Reiji Watanabe <reijiw at google.com>
> > ---
> > arch/arm64/lib/clear_page.S | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> > index b84b179edba3..ea3a6c927fe8 100644
> > --- a/arch/arm64/lib/clear_page.S
> > +++ b/arch/arm64/lib/clear_page.S
> > @@ -16,6 +16,7 @@
> > */
> > SYM_FUNC_START_PI(clear_page)
> > mrs x1, dczid_el0
> > + tbnz x1, #4, 2f /* Branch if DC ZVA is prohibited */
> > and w1, w1, #0xf
> > mov x2, #4
> > lsl x1, x2, x1
> > @@ -25,5 +26,14 @@ SYM_FUNC_START_PI(clear_page)
> > tst x0, #(PAGE_SIZE - 1)
> > b.ne 1b
> > ret
> > +
> > +2: stp xzr, xzr, [x0]
> > + stp xzr, xzr, [x0, #16]
> > + stp xzr, xzr, [x0, #32]
> > + stp xzr, xzr, [x0, #48]
> > + add x0, x0, #64
> > + tst x0, #(PAGE_SIZE - 1)
> > + b.ne 2b
> > + ret
> > SYM_FUNC_END_PI(clear_page)
> > EXPORT_SYMBOL(clear_page)
> >
More information about the linux-arm-kernel
mailing list