[PATCH] arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
Reiji Watanabe
reijiw at google.com
Tue Oct 26 23:44:51 PDT 2021
On Tue, Oct 26, 2021 at 5:23 AM Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Oct 26, 2021 at 12:22:20PM +0100, Robin Murphy wrote:
> > On 2021-10-26 04:48, Reiji Watanabe wrote:
> > > Currently, clear_page() uses DC ZVA instruction unconditionally. But it
> > > should make sure that DCZID_EL0.DZP, which indicates whether or not use
> > > of DC ZVA instruction is prohibited, is zero when using the instruction.
> > > Use stp as memset does instead when DCZID_EL0.DZP == 1.
> > >
> > > Signed-off-by: Reiji Watanabe <reijiw at google.com>
> > > ---
> > > arch/arm64/lib/clear_page.S | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
> > > index b84b179edba3..7ce1bfa4081c 100644
> > > --- a/arch/arm64/lib/clear_page.S
> > > +++ b/arch/arm64/lib/clear_page.S
> > > @@ -16,6 +16,7 @@
> > > */
> > > SYM_FUNC_START_PI(clear_page)
> > > mrs x1, dczid_el0
> > > + tbnz x1, #4, 2f /* Branch if DC GVA is prohibited */
>
> DCZID_EL0.DZP (AKA DCZID_EL0[4]) says whether all of DC {ZVA,GVA,GZVA}
> are prohibited. This loop uses DZ ZVA, not GC GVA, so it'd be nice to
> s/GVA/ZVA/ here.
Thank you for catching it ! I will fix that.
> Howver, `DC GVA` and `DC GZVA` are both used in mte_set_mem_tag_range(),
> which'll need a similar update...
Yes, I'm aware of that and mte_zero_clear_page_tags() needs to get
updated as well. But, Since I'm not familiar with MTE (and I don't
have any plans to use MTE yet), I didn't work on them (I'm not sure
how I can test them).
I might try to fix them separately later as well when I have time
(not so soon most likely though).
> > > and w1, w1, #0xf
> > > mov x2, #4
> > > lsl x1, x2, x1
> > > @@ -25,5 +26,15 @@ SYM_FUNC_START_PI(clear_page)
> > > tst x0, #(PAGE_SIZE - 1)
> > > b.ne 1b
> > > ret
> > > +
> > > +2: mov x1, #(PAGE_SIZE)
> > > + sub x0, x0, #16 /* Pre-bias. */
> >
> > Out of curiosity, what's this for? It's not like we need to worry about
> > PAGE_SIZE or page addresses being misaligned. I don't really see why we'd
> > need a different condition from the DC ZVA loop.
>
> I believe this was copied from arch/arm64/lib/memset.S, in the
> `.Lnot_short` case, where we have:
Yes, I used the same code as memset. I couldn't come up with any
other implementation that could get same performance with fewer
number of instructions in the loop than that.
> | .Lnot_short:
> | sub dst, dst, #16/* Pre-bias. */
> | sub count, count, #64
> | 1:
> | stp A_l, A_l, [dst, #16]
> | stp A_l, A_l, [dst, #32]
> | stp A_l, A_l, [dst, #48]
> | stp A_l, A_l, [dst, #64]!
> | subs count, count, #64
> | b.ge 1b
>
> > Robin.
> >
> > > +3: stp xzr, xzr, [x0, #16]
> > > + stp xzr, xzr, [x0, #32]
> > > + stp xzr, xzr, [x0, #48]
> > > + stp xzr, xzr, [x0, #64]!
> > > + subs x1, x1, #64
> > > + b.gt 3b
> > > + ret
>
> FWIW, I'd also prefer consistency with the existing loop, i.e.
>
> 2: stp xzr, xzr, [x0, #0]
> stp xzr, xzr, [x0, #16]
> stp xzr, xzr, [x0, #32]
> stp xzr, xzr, [x0, #48]
> add x0, x0, #64
> tst x0, #(PAGE_SIZE - 1)
> b.ne 2b
> ret
Thank you for the suggestion. Yes, I agree that it would be better
in terms of consistency with the existing loop.
I will fix this as you suggested in the v2 patch.
Thanks,
Reiji
More information about the linux-arm-kernel
mailing list