[PATCH v1] mm/contpte: Optimize loop to reduce redundant operations

Barry Song 21cnbao at gmail.com
Sat Apr 12 00:18:16 PDT 2025


On Fri, Apr 11, 2025 at 8:03 PM David Laight
<david.laight.linux at gmail.com> wrote:
>
> On Fri, 11 Apr 2025 09:25:39 +1200
> Barry Song <21cnbao at gmail.com> wrote:
>
> > On Mon, Apr 7, 2025 at 9:23 PM Xavier <xavier_qy at 163.com> wrote:
> > >
> > > This commit optimizes the contpte_ptep_get function by adding early
> > >  termination logic. It checks if the dirty and young bits of orig_pte
> > >  are already set and skips redundant bit-setting operations during
> > >  the loop. This reduces unnecessary iterations and improves performance.
> > >
> > > Signed-off-by: Xavier <xavier_qy at 163.com>
> > > ---
> > >  arch/arm64/mm/contpte.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > > index bcac4f55f9c1..ca15d8f52d14 100644
> > > --- a/arch/arm64/mm/contpte.c
> > > +++ b/arch/arm64/mm/contpte.c
> > > @@ -163,17 +163,26 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
> > >
> > >         pte_t pte;
> > >         int i;
> > > +       bool dirty = false;
> > > +       bool young = false;
> > >
> > >         ptep = contpte_align_down(ptep);
> > >
> > >         for (i = 0; i < CONT_PTES; i++, ptep++) {
> > >                 pte = __ptep_get(ptep);
> > >
> > > -               if (pte_dirty(pte))
> > > +               if (!dirty && pte_dirty(pte)) {
> > > +                       dirty = true;
> > >                         orig_pte = pte_mkdirty(orig_pte);
> > > +               }
> > >
> > > -               if (pte_young(pte))
> > > +               if (!young && pte_young(pte)) {
> > > +                       young = true;
> > >                         orig_pte = pte_mkyoung(orig_pte);
> > > +               }
> > > +
> > > +               if (dirty && young)
> > > +                       break;
> >
> > This kind of optimization is always tricky. Dev previously tried a similar
> > approach to reduce the loop count, but it ended up causing performance
> > degradation:
> > https://lore.kernel.org/linux-mm/20240913091902.1160520-1-dev.jain@arm.com/
> >
> > So we may need actual data to validate this idea.
>
> You might win with 3 loops.
> The first looks for both 'dirty' and 'young'.
> If it finds only one it jumps to a different loop that continues
> the search but only looks for the other flag.

However, there's no concrete evidence that this loop is a hot path. It might
save a few instructions when the first PTE is both young and dirty, but in
many cases, none of the PTEs are either. Previous profiling indicates that
the actual hot path lies in the rmap walk to locate the page table entries
during folio_reference in the reclamation path.

I suspect the code you're optimizing isn't actually part of a hot path at all.

>
>         David
>
> >
> > >         }
> > >
> > >         return orig_pte;
> > > --
> > > 2.34.1
> > >
> >

Thanks
Barry



More information about the linux-arm-kernel mailing list