[PATCH v1] mm/contpte: Optimize loop to reduce redundant operations

Lance Yang ioworker0 at gmail.com
Tue Apr 8 02:17:27 PDT 2025


On Tue, Apr 8, 2025 at 12:19 AM Dev Jain <dev.jain at arm.com> wrote:
>
>
> Hi Xavier,
>
> On 07/04/25 7:01 pm, Lance Yang wrote:
> > On Mon, Apr 7, 2025 at 8:56 PM Xavier <xavier_qy at 163.com> wrote:
> >>
> >>
> >>
> >> Hi Lance,
> >>
> >> Thanks for your feedback, my response is as follows.
> >>
> >> --
> >> Thanks,
> >> Xavier
> >>
> >>
> >>
> >>
> >>
> >> At 2025-04-07 19:29:22, "Lance Yang" <ioworker0 at gmail.com> wrote:
> >>> Thanks for the patch. Would the following change be better?
> >>>
> >>> diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> >>> index 55107d27d3f8..64eb3b2fbf06 100644
> >>> --- a/arch/arm64/mm/contpte.c
> >>> +++ b/arch/arm64/mm/contpte.c
> >>> @@ -174,6 +174,9 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
> >>>
> >>>                if (pte_young(pte))
> >>>                        orig_pte = pte_mkyoung(orig_pte);
> >>> +
> >>> +              if (pte_young(orig_pte) && pte_dirty(orig_pte))
> >>> +                      break;
> >>>        }
>
> Quite the coincidence, I was thinking of doing exactly this some days
> back and testing it out : ) Can you do a microanalysis whether this gets
> us a benefit or not? This looks like an optimization on paper but may
> not be one after all because CONT_PTES is only 16 and a simple loop
> without extra if-conditions may just be faster.

Yeah, I was thinking the same ;)

This change seems reasonable in theory. But with CONT_PTES=16, keeping
it simple might actually be faster, IMO.

Thanks,
Lance

>
> >>>
> >>>        return orig_pte;
> >>> --
> >>>
> >>> We can check the orig_pte flags directly instead of using extra boolean
> >>> variables, which gives us an early-exit when both dirty and young flags
> >>> are set.
> >> Your way of writing the code is indeed more concise. However, I think
> >>   using boolean variables might be more efficient. Although it introduces
> >>   additional variables, comparing boolean values is likely to be more
> >>   efficient than checking bit settings.
> >>
> >>>
> >>> Also, is this optimization really needed for the common case?
> >> This function is on a high-frequency execution path. During debugging,
> >>   I found that in most cases, the first few pages are already marked as
> >>   both dirty and young. But currently, the program still has to complete
> >>   the entire loop of 16 ptep iterations, which seriously reduces the efficiency.
> >
> > Hmm... agreed that this patch helps when early PTEs are dirty/young, but
> > for late-ones-only cases, it only introduces overhead with no benefit, IIUC.
> >
> > So, let's wait for folks to take a look ;)
> >
> > Thanks,
> > Lance
> >
> >>>
> >>> Thanks,
> >>> Lance
> >
>



More information about the linux-arm-kernel mailing list