[PATCH v6 1/5] mm: rmap: support batched checks of the references for large folios

Baolin Wang baolin.wang at linux.alibaba.com
Fri Mar 6 18:22:51 PST 2026



On 3/7/26 5:07 AM, Barry Song wrote:
> On Mon, Feb 9, 2026 at 10:07 PM Baolin Wang
> <baolin.wang at linux.alibaba.com> wrote:
>>
>> Currently, folio_referenced_one() always checks the young flag for each PTE
>> sequentially, which is inefficient for large folios. This inefficiency is
>> especially noticeable when reclaiming clean file-backed large folios, where
>> folio_referenced() is observed as a significant performance hotspot.
>>
>> Moreover, on Arm64 architecture, which supports contiguous PTEs, there is already
>> an optimization to clear the young flags for PTEs within a contiguous range.
>> However, this is not sufficient. We can extend this to perform batched operations
>> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>>
>> Introduce a new API: clear_flush_young_ptes() to facilitate batched checking
>> of the young flags and flushing TLB entries, thereby improving performance
>> during large folio reclamation. And it will be overridden by the architecture
>> that implements a more efficient batch operation in the following patches.
>>
>> While we are at it, rename ptep_clear_flush_young_notify() to
>> clear_flush_young_ptes_notify() to indicate that this is a batch operation.
>>
>> Reviewed-by: Harry Yoo <harry.yoo at oracle.com>
>> Reviewed-by: Ryan Roberts <ryan.roberts at arm.com>
>> Signed-off-by: Baolin Wang <baolin.wang at linux.alibaba.com>
> 
> LGTM,
> 
> Reviewed-by: Barry Song <baohua at kernel.org>

Thanks.

>> ---
>>   include/linux/mmu_notifier.h |  9 +++++----
>>   include/linux/pgtable.h      | 35 +++++++++++++++++++++++++++++++++++
>>   mm/rmap.c                    | 28 +++++++++++++++++++++++++---
>>   3 files changed, 65 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
>> index d1094c2d5fb6..07a2bbaf86e9 100644
>> --- a/include/linux/mmu_notifier.h
>> +++ b/include/linux/mmu_notifier.h
>> @@ -515,16 +515,17 @@ static inline void mmu_notifier_range_init_owner(
>>          range->owner = owner;
>>   }
>>
>> -#define ptep_clear_flush_young_notify(__vma, __address, __ptep)                \
>> +#define clear_flush_young_ptes_notify(__vma, __address, __ptep, __nr)  \
>>   ({                                                                     \
>>          int __young;                                                    \
>>          struct vm_area_struct *___vma = __vma;                          \
>>          unsigned long ___address = __address;                           \
>> -       __young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
>> +       unsigned int ___nr = __nr;                                      \
>> +       __young = clear_flush_young_ptes(___vma, ___address, __ptep, ___nr);    \
>>          __young |= mmu_notifier_clear_flush_young(___vma->vm_mm,        \
>>                                                    ___address,           \
>>                                                    ___address +          \
>> -                                                       PAGE_SIZE);     \
>> +                                                 ___nr * PAGE_SIZE);   \
>>          __young;                                                        \
>>   })
>>
>> @@ -650,7 +651,7 @@ static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
>>
>>   #define mmu_notifier_range_update_to_read_only(r) false
>>
>> -#define ptep_clear_flush_young_notify ptep_clear_flush_young
>> +#define clear_flush_young_ptes_notify clear_flush_young_ptes
>>   #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
>>   #define ptep_clear_young_notify ptep_test_and_clear_young
>>   #define pmdp_clear_young_notify pmdp_test_and_clear_young
>> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
>> index 21b67d937555..a50df42a893f 100644
>> --- a/include/linux/pgtable.h
>> +++ b/include/linux/pgtable.h
>> @@ -1068,6 +1068,41 @@ static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
>>   }
>>   #endif
>>
>> +#ifndef clear_flush_young_ptes
>> +/**
>> + * clear_flush_young_ptes - Mark PTEs that map consecutive pages of the same
>> + *                         folio as old and flush the TLB.
>> + * @vma: The virtual memory area the pages are mapped into.
>> + * @addr: Address the first page is mapped at.
>> + * @ptep: Page table pointer for the first entry.
>> + * @nr: Number of entries to clear access bit.
>> + *
>> + * May be overridden by the architecture; otherwise, implemented as a simple
>> + * loop over ptep_clear_flush_young().
>> + *
>> + * Note that PTE bits in the PTE range besides the PFN can differ. For example,
>> + * some PTEs might be write-protected.
>> + *
>> + * Context: The caller holds the page table lock.  The PTEs map consecutive
>> + * pages that belong to the same folio.  The PTEs are all in the same PMD.
>> + */
>> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
>> +               unsigned long addr, pte_t *ptep, unsigned int nr)
>> +{
>> +       int young = 0;
>> +
>> +       for (;;) {
>> +               young |= ptep_clear_flush_young(vma, addr, ptep);
>> +               if (--nr == 0)
>> +                       break;
>> +               ptep++;
>> +               addr += PAGE_SIZE;
>> +       }
>> +
>> +       return young;
>> +}
>> +#endif
> 
> We might have an opportunity to batch the TLB synchronization,
> using flush_tlb_range() instead of calling flush_tlb_page()
> one by one. Not sure the benefit would be significant though,
> especially if only one entry among nr has the young bit set.

Yes. In addition, this will involve many architectures’ implementations 
and their differing TLB flush mechanisms, so it’s difficult to make a 
reasonable per-architecture measurement. If any architecture has a more 
efficient flush method, I’d prefer to implement an architecture‑specific 
clear_flush_young_ptes().



More information about the linux-arm-kernel mailing list