[PATCH 0/2] support batched checks of the references for large folios
David Hildenbrand (Red Hat)
david at kernel.org
Mon Dec 1 08:23:13 PST 2025
On 11/25/25 01:56, Baolin Wang wrote:
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> By supporting batched checking of the young flags and flushing TLB entries,
> I observed a 33% performance improvement in my file-backed folios reclaim tests.
Can you point at the benchmark or briefly explain what it does? What
exactly are we measuring that improves by 33%?
--
Cheers
David
More information about the linux-arm-kernel
mailing list