[PATCH 0/2] support batched checks of the references for large folios
Baolin Wang
baolin.wang at linux.alibaba.com
Mon Nov 24 16:56:49 PST 2025
Currently, folio_referenced_one() always checks the young flag for each PTE
sequentially, which is inefficient for large folios. This inefficiency is
especially noticeable when reclaiming clean file-backed large folios, where
folio_referenced() is observed as a significant performance hotspot.
Moreover, on Arm architecture, which supports contiguous PTEs, there is already
an optimization to clear the young flags for PTEs within a contiguous range.
However, this is not sufficient. We can extend this to perform batched operations
for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
By supporting batched checking of the young flags and flushing TLB entries,
I observed a 33% performance improvement in my file-backed folios reclaim tests.
BTW, I still noticed a hotspot in try_to_unmap() in my test. Hope Barry can
resend the optimization patch for try_to_unmap() [1].
[1] https://lore.kernel.org/all/20250513084620.58231-1-21cnbao@gmail.com/
Baolin Wang (2):
arm64: mm: support batch clearing of the young flag for large folios
mm: rmap: support batched checks of the references for large folios
arch/arm64/include/asm/pgtable.h | 23 ++++++++++++-----
arch/arm64/mm/contpte.c | 44 ++++++++++++++++++++++----------
include/linux/mmu_notifier.h | 9 ++++---
include/linux/pgtable.h | 19 ++++++++++++++
mm/rmap.c | 22 ++++++++++++++--
5 files changed, 92 insertions(+), 25 deletions(-)
--
2.47.3
More information about the linux-arm-kernel
mailing list