[PATCH v2 1/2] arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

Reiji Watanabe reijiw at google.com
Thu Nov 18 00:18:46 PST 2021


On Tue, Nov 16, 2021 at 3:00 PM Robin Murphy <robin.murphy at arm.com> wrote:
>
> On 2021-11-08 07:11, Reiji Watanabe wrote:
> > Currently, clear_page() uses DC ZVA instruction unconditionally.  But it
> > should make sure that DCZID_EL0.DZP, which indicates whether or not use
> > of DC ZVA instruction is prohibited, is zero when using the instruction.
> > Use STP as memset does instead when DCZID_EL0.DZP == 1.
>
> Reviewed-by: Robin Murphy <robin.murphy at arm.com>
>
> FWIW I did eventually figure out the "pre-bias" trick from v1 thanks to
> Mark's nod toward the original context, but a quick survey of various
> optimisation guides implied that the explicit add should generally be
> preferred over post-index writeback anyway, so I think we're all good here.

Thank you for the review!
The original code, which used *pre*-index (not post-index) addressing,
made no significant difference in page_clear performance on my test
environment from the current code.

Now, I am looking at creating v3 patches to use stnp instead of stp
in page_clear (NOTE: DC ZVA shows much better performance on my test
system than stp/stnp).

Although using stnp didn't show significant difference in clear_page()
performance on my test system from stp (no significant difference in
cache-misses, cache_refill, cache_wb, or cache_allocate event counter
either), using stnp should be more appropriate for page_clear than stp,
and I understand it could show better performance on some CPUs.

Thanks,
Reiji




>
> Cheers,
> Robin.
>
> > Fixes: f27bb139c387 ("arm64: Miscellaneous library functions")
> > Signed-off-by: Reiji Watanabe <reijiw at google.com>
> > ---
> >   arch/arm64/lib/clear_page.S | 10 ++++++++++
> >   1 file changed, 10 insertions(+)
> >
> > diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> > index b84b179edba3..ea3a6c927fe8 100644
> > --- a/arch/arm64/lib/clear_page.S
> > +++ b/arch/arm64/lib/clear_page.S
> > @@ -16,6 +16,7 @@
> >    */
> >   SYM_FUNC_START_PI(clear_page)
> >       mrs     x1, dczid_el0
> > +     tbnz    x1, #4, 2f      /* Branch if DC ZVA is prohibited */
> >       and     w1, w1, #0xf
> >       mov     x2, #4
> >       lsl     x1, x2, x1
> > @@ -25,5 +26,14 @@ SYM_FUNC_START_PI(clear_page)
> >       tst     x0, #(PAGE_SIZE - 1)
> >       b.ne    1b
> >       ret
> > +
> > +2:   stp     xzr, xzr, [x0]
> > +     stp     xzr, xzr, [x0, #16]
> > +     stp     xzr, xzr, [x0, #32]
> > +     stp     xzr, xzr, [x0, #48]
> > +     add     x0, x0, #64
> > +     tst     x0, #(PAGE_SIZE - 1)
> > +     b.ne    2b
> > +     ret
> >   SYM_FUNC_END_PI(clear_page)
> >   EXPORT_SYMBOL(clear_page)
> >



More information about the linux-arm-kernel mailing list