[PATCH 3/5] mm: add a batched helper to clear the young flag for large folios
Baolin Wang
baolin.wang at linux.alibaba.com
Wed Feb 25 19:42:12 PST 2026
On 2/25/26 10:04 PM, David Hildenbrand (Arm) wrote:
> On 2/24/26 02:56, Baolin Wang wrote:
>> Currently, MGLRU will call ptep_clear_young_notify() to check and clear the
>> young flag for each PTE sequentially, which is inefficient for large folios
>> reclamation.
>>
>> Moreover, on Arm64 architecture, which supports contiguous PTEs, the Arm64-
>> specific ptep_test_and_clear_young() already implements an optimization to
>> clear the young flags for PTEs within a contiguous range. However, this is not
>> sufficient. Similar to the Arm64 specific clear_flush_young_ptes(), we can
>> extend this to perform batched operations for the entire large folio (which
>> might exceed the contiguous range: CONT_PTE_SIZE).
>>
>> Thus, we can introduce a new batched helper: test_and_clear_young_ptes() and
>> its wrapper clear_young_ptes_notify(), to perform batched checking of the young
>> flags for large folios, which can help improve performance during large folio
>> reclamation when MGLRU is enabled. And it will be overridden by the architecture
>> that implements a more efficient batch operation in the following patches.
>>
>
> Maybe mention that the implementation follows the other existing functions.
Ack.
>> Signed-off-by: Baolin Wang <baolin.wang at linux.alibaba.com>
>> ---
>> include/linux/pgtable.h | 36 ++++++++++++++++++++++++++++++++++++
>> mm/internal.h | 23 ++++++++++++++++++-----
>> 2 files changed, 54 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 776993d4567b..0bcd3be524d3 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -1103,6 +1103,42 @@ static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
>> }
>> #endif
>>
>> +#ifndef test_and_clear_young_ptes
>> +/**
>> + * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the same
>> + * folio as old
>> + * @vma: The virtual memory area the pages are mapped into.
>> + * @addr: Address the first page is mapped at.
>> + * @ptep: Page table pointer for the first entry.
>> + * @nr: Number of entries to clear access bit.
>> + *
>> + * May be overridden by the architecture; otherwise, implemented as a simple
>> + * loop over ptep_test_and_clear_young().
>> + *
>> + * Note that PTE bits in the PTE range besides the PFN can differ. For example,
>> + * some PTEs might be write-protected.
>
> Document the return value?
>
> Returns: whether any PTE was young.
Ack.
>
> Or sth like that.
>
>> + *
>> + * Context: The caller holds the page table lock. The PTEs map consecutive
>> + * pages that belong to the same folio. The PTEs are all in the same PMD.
>> + */
>> +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep,
>> + unsigned int nr)
>
> Two tabs ...
Ack.
>
>> +{
>> + int young = 0;
>> +
>> + for (;;) {
>> + young |= ptep_test_and_clear_young(vma, addr, ptep);
>> + if (--nr == 0)
>> + break;
>> + ptep++;
>> + addr += PAGE_SIZE;
>> + }
>> +
>> + return young;
>
> BTW: can this function simply return (and use) a bool instead?
>
> Likely we should do the same for the other functions, but that can be
> done separately.
Yes, add this to my TODO list to convert all related functions.
>> /*
>> * On some architectures hardware does not set page access bit when accessing
>> * memory page, it is responsibility of software setting this bit. It brings
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 1ba175b8d4f1..1b59be99dc3f 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -1813,16 +1813,23 @@ static inline int pmdp_clear_flush_young_notify(struct vm_area_struct *vma,
>> return young;
>> }
>>
>> -static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> - unsigned long addr, pte_t *ptep)
>> +static inline int clear_young_ptes_notify(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep,
>> + unsigned int nr)
>> {
>> int young;
>>
>> - young = ptep_test_and_clear_young(vma, addr, ptep);
>> - young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE);
>> + young = test_and_clear_young_ptes(vma, addr, ptep, nr);
>> + young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SIZE);
>> return young;
>> }
>>
>> +static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + return clear_young_ptes_notify(vma, addr, ptep, 1);
>> +}
>> +
>> static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
>> unsigned long addr, pmd_t *pmdp)
>> {
>> @@ -1837,9 +1844,15 @@ static inline int pmdp_clear_young_notify(struct vm_area_struct *vma,
>>
>> #define clear_flush_young_ptes_notify clear_flush_young_ptes
>> #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
>> -#define ptep_clear_young_notify ptep_test_and_clear_young
>> +#define clear_young_ptes_notify test_and_clear_young_ptes
>> #define pmdp_clear_young_notify pmdp_test_and_clear_young
>>
>> +static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
>> + unsigned long addr, pte_t *ptep)
>> +{
>> + return test_and_clear_young_ptes(vma, addr, ptep, 1);
>> +}
>
> Why not outside of the ifdef a single generic
>
> static inline int ptep_clear_young_notify(struct vm_area_struct *vma,
> unsigned long addr, pte_t *ptep)
> {
> return clear_young_ptes_notify(vma, addr, ptep, 1);
> }
Yes, will do. And this function will be removed in the following patch.
More information about the linux-arm-kernel
mailing list