[PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios

Tue Jan 6 19:31:08 PST 2026

On Wed, Jan 07, 2026 at 10:29:18AM +0800, Baolin Wang wrote:
>
>
>On 1/7/26 10:21 AM, Barry Song wrote:
>> On Wed, Jan 7, 2026 at 2:46 PM Wei Yang <richard.weiyang at gmail.com> wrote:
>> > 
>> > On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
>> > > On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang at gmail.com> wrote:
>> > > > 
>> > > > On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
>> > > > > Similar to folio_referenced_one(), we can apply batched unmapping for file
>> > > > > large folios to optimize the performance of file folios reclamation.
>> > > > > 
>> > > > > Barry previously implemented batched unmapping for lazyfree anonymous large
>> > > > > folios[1] and did not further optimize anonymous large folios or file-backed
>> > > > > large folios at that stage. As for file-backed large folios, the batched
>> > > > > unmapping support is relatively straightforward, as we only need to clear
>> > > > > the consecutive (present) PTE entries for file-backed large folios.
>> > > > > 
>> > > > > Performance testing:
>> > > > > Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
>> > > > > reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
>> > > > > 75% performance improvement on my Arm64 32-core server (and 50%+ improvement
>> > > > > on my X86 machine) with this patch.
>> > > > > 
>> > > > > W/o patch:
>> > > > > real    0m1.018s
>> > > > > user    0m0.000s
>> > > > > sys     0m1.018s
>> > > > > 
>> > > > > W/ patch:
>> > > > > real   0m0.249s
>> > > > > user   0m0.000s
>> > > > > sys    0m0.249s
>> > > > > 
>> > > > > [1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@gmail.com/T/#u
>> > > > > Reviewed-by: Ryan Roberts <ryan.roberts at arm.com>
>> > > > > Acked-by: Barry Song <baohua at kernel.org>
>> > > > > Signed-off-by: Baolin Wang <baolin.wang at linux.alibaba.com>
>> > > > > ---
>> > > > > mm/rmap.c | 7 ++++---
>> > > > > 1 file changed, 4 insertions(+), 3 deletions(-)
>> > > > > 
>> > > > > diff --git a/mm/rmap.c b/mm/rmap.c
>> > > > > index 985ab0b085ba..e1d16003c514 100644
>> > > > > --- a/mm/rmap.c
>> > > > > +++ b/mm/rmap.c
>> > > > > @@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> > > > >        end_addr = pmd_addr_end(addr, vma->vm_end);
>> > > > >        max_nr = (end_addr - addr) >> PAGE_SHIFT;
>> > > > > 
>> > > > > -      /* We only support lazyfree batching for now ... */
>> > > > > -      if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
>> > > > > +      /* We only support lazyfree or file folios batching for now ... */
>> > > > > +      if (folio_test_anon(folio) && folio_test_swapbacked(folio))
>> > > > >                return 1;
>> > > > > +
>> > > > >        if (pte_unused(pte))
>> > > > >                return 1;
>> > > > > 
>> > > > > @@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> > > > >                         *
>> > > > >                         * See Documentation/mm/mmu_notifier.rst
>> > > > >                         */
>> > > > > -                      dec_mm_counter(mm, mm_counter_file(folio));
>> > > > > +                      add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
>> > > > >                }
>> > > > > discard:
>> > > > >                if (unlikely(folio_test_hugetlb(folio))) {
>> > > > > --
>> > > > > 2.47.3
>> > > > > 
>> > > > 
>> > > > Hi, Baolin
>> > > > 
>> > > > When reading your patch, I come up one small question.
>> > > > 
>> > > > Current try_to_unmap_one() has following structure:
>> > > > 
>> > > >      try_to_unmap_one()
>> > > >          while (page_vma_mapped_walk(&pvmw)) {
>> > > >              nr_pages = folio_unmap_pte_batch()
>> > > > 
>> > > >              if (nr_pages = folio_nr_pages(folio))
>> > > >                  goto walk_done;
>> > > >          }
>> > > > 
>> > > > I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().
>> > > > 
>> > > > If my understanding is correct, page_vma_mapped_walk() would start from
>> > > > (pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
>> > > > (pvmw->address + nr_pages * PAGE_SIZE), right?
>> > > > 
>> > > > Not sure my understanding is correct, if so do we have some reason not to
>> > > > skip the cleared range?
>> > > 
>> > > I don’t quite understand your question. For nr_pages > 1 but not equal
>> > > to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.
>> > > 
>> > > take a look:
>> > > 
>> > > next_pte:
>> > >                 do {
>> > >                         pvmw->address += PAGE_SIZE;
>> > >                         if (pvmw->address >= end)
>> > >                                 return not_found(pvmw);
>> > >                         /* Did we cross page table boundary? */
>> > >                         if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
>> > >                                 if (pvmw->ptl) {
>> > >                                         spin_unlock(pvmw->ptl);
>> > >                                         pvmw->ptl = NULL;
>> > >                                 }
>> > >                                 pte_unmap(pvmw->pte);
>> > >                                 pvmw->pte = NULL;
>> > >                                 pvmw->flags |= PVMW_PGTABLE_CROSSED;
>> > >                                 goto restart;
>> > >                         }
>> > >                         pvmw->pte++;
>> > >                 } while (pte_none(ptep_get(pvmw->pte)));
>> > > 
>> > 
>> > Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
>> > will be skipped.
>> > 
>> > I mean maybe we can skip it in try_to_unmap_one(), for example:
>> > 
>> > diff --git a/mm/rmap.c b/mm/rmap.c
>> > index 9e5bd4834481..ea1afec7c802 100644
>> > --- a/mm/rmap.c
>> > +++ b/mm/rmap.c
>> > @@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> >                   */
>> >                  if (nr_pages == folio_nr_pages(folio))
>> >                          goto walk_done;
>> > +               else {
>> > +                       pvmw.address += PAGE_SIZE * (nr_pages - 1);
>> > +                       pvmw.pte += nr_pages - 1;
>> > +               }
>> >                  continue;
>> >   walk_abort:
>> >                  ret = false;
>> 
>> 
>> I feel this couples the PTE walk iteration with the unmap
>> operation, which does not seem fine to me. It also appears
>> to affect only corner cases.
>
>Agree. There may be no performance gains, so I also prefer to leave it as is.

Got it, thanks.

-- 
Wei Yang
Help you, Help me