[PATCH v9 06/14] mm: multi-gen LRU: minimal implementation

Thu Mar 24 01:13:48 PDT 2022

On Thu, Mar 24, 2022 at 7:24 PM Yu Zhao <yuzhao at google.com> wrote:
>
> On Wed, Mar 23, 2022 at 1:47 AM Barry Song <21cnbao at gmail.com> wrote:
> >
> > On Sat, Mar 19, 2022 at 4:11 PM Yu Zhao <yuzhao at google.com> wrote:
> > >
> > > On Fri, Mar 18, 2022 at 9:01 PM Barry Song <21cnbao at gmail.com> wrote:
> > > >
> > > > > +static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool reclaiming)
> > > > > +{
> > > > > +       unsigned long old_flags, new_flags;
> > > > > +       int type = folio_is_file_lru(folio);
> > > > > +       struct lru_gen_struct *lrugen = &lruvec->lrugen;
> > > > > +       int new_gen, old_gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > > > > +
> > > > > +       do {
> > > > > +               new_flags = old_flags = READ_ONCE(folio->flags);
> > > > > +               VM_BUG_ON_FOLIO(!(new_flags & LRU_GEN_MASK), folio);
> > > > > +
> > > > > +               new_gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
> > > > > +               new_gen = (old_gen + 1) % MAX_NR_GENS;
> > > >
> > > > new_gen is assigned twice, i assume you mean
> > > >                old_gen = ((new_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
> > > >                new_gen = (old_gen + 1) % MAX_NR_GENS;
> > > >
> > > > or do you always mean new_gen =  lru_gen_from_seq(min_seq) + 1?
> > >
> > > Thanks a lot for your attention to details!
> > >
> > > The first line should be in the next patch but I overlooked during the
> > > last refactoring:
> >
> > Thanks for the clarification. So an unmapped file-backed page which is
> > accessed only by system call will always be in either min_seq or
> > min_seq + 1? it has no chance to be in max_seq like a faulted-in
> > mapped file page?
>
> That's right. The rationale is documented here under the `Assumptions`
> section [1]. This is also related to Aneesh's question about why MGLRU
> doesn't need additional heuristics for VM_EXEC pages [2]. Unmapped
> file pages weaken the protection of executable pages under heavy
> buffered IO workloads like Java NIO.

ok. This is probably right.
i will also run a test by maltreating unmapped page in vanilla LRU, the
PoC code is like (not been tested yet):

Subject: [PATCH 1/1] mm: vmscan: maltreat unmapped file-backed pages

[This patch has not been tested yet.]

A lesson we learned from MGLRU is that mapped filed-backed pages
are much more important than unmapped ones.
So this patch doesn't move the second accessed unmapped pages to
the active list, alternatively, it keeps the pages in the inactive
list. And we abuse PG_workingset to let the memory reclaim this
is a relatively hot file-backed page, so the reclaim should keep
the pages in the inactive list.

---
 mm/swap.c   | 34 ++++++++++++++++++++++------------
 mm/vmscan.c |  6 ++++--
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index e65e7520bebf..cb0c6e704f2e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -470,18 +470,28 @@ void folio_mark_accessed(struct folio *folio)
  * evictable page accessed has no effect.
  */
  } else if (!folio_test_active(folio)) {
- /*
- * If the page is on the LRU, queue it for activation via
- * lru_pvecs.activate_page. Otherwise, assume the page is on a
- * pagevec, mark it active and it'll be moved to the active
- * LRU on the next drain.
- */
- if (folio_test_lru(folio))
- folio_activate(folio);
- else
- __lru_cache_activate_folio(folio);
- folio_clear_referenced(folio);
- workingset_activation(folio);
+ if (folio_mapped(folio)) {
+ /*
+ * If the mapped page is on the LRU, queue it for activation via
+ * lru_pvecs.activate_page. Otherwise, assume the page is on a
+ * pagevec, mark it active and it'll be moved to the active
+ * LRU on the next drain.
+ */
+ if (folio_test_lru(folio))
+ folio_activate(folio);
+ else
+ __lru_cache_activate_folio(folio);
+ folio_clear_referenced(folio);
+ workingset_activation(folio);
+ } else {
+ /*
+ * we maltreat unmmaped file-backed pages and abuse PG_workingset
+ * flag to let the eviction know this page is a relatively hot file
+ * page, thus, the eviction can move it back to the head of the
+ * inactive list
+ */
+ folio_set_workingset(folio);
+ }
  }
  if (folio_test_idle(folio))
  folio_clear_idle(folio);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d6f3c9812f97..56a66eb4a3f7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1393,12 +1393,14 @@ enum page_references {
 static enum page_references page_check_references(struct page *page,
    struct scan_control *sc)
 {
- int referenced_ptes, referenced_page;
+ int referenced_ptes, referenced_page, workingset;
  unsigned long vm_flags;

  referenced_ptes = page_referenced(page, 1, sc->target_mem_cgroup,
    &vm_flags);
  referenced_page = TestClearPageReferenced(page);
+ workingset = page_is_file_lru(page) && !page_mapped(page) &&
+ TestClearPageWorkingset(page);

  /*
  * Mlock lost the isolation race with us.  Let try_to_unmap()
@@ -1438,7 +1440,7 @@ static enum page_references
page_check_references(struct page *page,

  /* Reclaim if clean, defer dirty pages to writeback */
  if (referenced_page && !PageSwapBacked(page))
- return PAGEREF_RECLAIM_CLEAN;
+ return workingset ?  PAGEREF_KEEP : PAGEREF_RECLAIM_CLEAN;

  return PAGEREF_RECLAIM;
 }

>
> [1] https://lore.kernel.org/linux-mm/20220309021230.721028-15-yuzhao@google.com/
> [2] https://lore.kernel.org/linux-mm/CAOUHufYfpiGdLSdffvzDqaD5oYFG99oDJ2xgQd2Ph77OFR5NAA@mail.gmail.com/

Thanks
Barry