[PATCH] arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

Reiji Watanabe reijiw at google.com
Tue Oct 26 23:44:51 PDT 2021


On Tue, Oct 26, 2021 at 5:23 AM Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Oct 26, 2021 at 12:22:20PM +0100, Robin Murphy wrote:
> > On 2021-10-26 04:48, Reiji Watanabe wrote:
> > > Currently, clear_page() uses DC ZVA instruction unconditionally.  But it
> > > should make sure that DCZID_EL0.DZP, which indicates whether or not use
> > > of DC ZVA instruction is prohibited, is zero when using the instruction.
> > > Use stp as memset does instead when DCZID_EL0.DZP == 1.
> > >
> > > Signed-off-by: Reiji Watanabe <reijiw at google.com>
> > > ---
> > >   arch/arm64/lib/clear_page.S | 11 +++++++++++
> > >   1 file changed, 11 insertions(+)
> > >
> > > diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> > > index b84b179edba3..7ce1bfa4081c 100644
> > > --- a/arch/arm64/lib/clear_page.S
> > > +++ b/arch/arm64/lib/clear_page.S
> > > @@ -16,6 +16,7 @@
> > >    */
> > >   SYM_FUNC_START_PI(clear_page)
> > >     mrs     x1, dczid_el0
> > > +   tbnz    x1, #4, 2f      /* Branch if DC GVA is prohibited */
>
> DCZID_EL0.DZP (AKA DCZID_EL0[4]) says whether all of DC {ZVA,GVA,GZVA}
> are prohibited. This loop uses DZ ZVA, not GC GVA, so it'd be nice to
> s/GVA/ZVA/ here.

Thank you for catching it ! I will fix that.

> Howver, `DC GVA` and `DC GZVA` are both used in mte_set_mem_tag_range(),
> which'll need a similar update...

Yes, I'm aware of that and mte_zero_clear_page_tags() needs to get
updated as well.  But, Since I'm not familiar with MTE (and I don't
have any plans to use MTE yet), I didn't work on them (I'm not sure
how I can test them).
I might try to fix them separately later as well when I have time
(not so soon most likely though).


> > >     and     w1, w1, #0xf
> > >     mov     x2, #4
> > >     lsl     x1, x2, x1
> > > @@ -25,5 +26,15 @@ SYM_FUNC_START_PI(clear_page)
> > >     tst     x0, #(PAGE_SIZE - 1)
> > >     b.ne    1b
> > >     ret
> > > +
> > > +2: mov     x1, #(PAGE_SIZE)
> > > +   sub     x0, x0, #16     /* Pre-bias. */
> >
> > Out of curiosity, what's this for? It's not like we need to worry about
> > PAGE_SIZE or page addresses being misaligned. I don't really see why we'd
> > need a different condition from the DC ZVA loop.
>
> I believe this was copied from arch/arm64/lib/memset.S, in the
> `.Lnot_short` case, where we have:

Yes, I used the same code as memset.  I couldn't come up with any
other implementation that could get same performance with fewer
number of instructions in the loop than that.


> | .Lnot_short:
> |         sub     dst, dst, #16/* Pre-bias.  */
> |         sub     count, count, #64
> | 1:
> |         stp     A_l, A_l, [dst, #16]
> |         stp     A_l, A_l, [dst, #32]
> |         stp     A_l, A_l, [dst, #48]
> |         stp     A_l, A_l, [dst, #64]!
> |         subs    count, count, #64
> |         b.ge    1b
>
> > Robin.
> >
> > > +3: stp     xzr, xzr, [x0, #16]
> > > +   stp     xzr, xzr, [x0, #32]
> > > +   stp     xzr, xzr, [x0, #48]
> > > +   stp     xzr, xzr, [x0, #64]!
> > > +   subs    x1, x1, #64
> > > +   b.gt    3b
> > > +   ret
>
> FWIW, I'd also prefer consistency with the existing loop, i.e.
>
> 2:      stp     xzr, xzr, [x0, #0]
>         stp     xzr, xzr, [x0, #16]
>         stp     xzr, xzr, [x0, #32]
>         stp     xzr, xzr, [x0, #48]
>         add     x0, x0, #64
>         tst     x0, #(PAGE_SIZE - 1)
>         b.ne    2b
>         ret

Thank you for the suggestion.  Yes, I agree that it would be better
in terms of consistency with the existing loop.
I will fix this as you suggested in the v2 patch.

Thanks,
Reiji



More information about the linux-arm-kernel mailing list