[PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap
Nadav Amit
nadav.amit at gmail.com
Thu Sep 1 02:18:10 PDT 2022
> On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao at google.com> wrote:
>
> Searching the rmap for PTEs mapping each page on an LRU list (to test
> and clear the accessed bit) can be expensive because pages from
> different VMAs (PA space) are not cache friendly to the rmap (VA
> space). For workloads mostly using mapped pages, searching the rmap
> can incur the highest CPU cost in the reclaim path.
Impressive work. Sorry if my feedback is not timely.
Just one minor point for thought, that can be left for a later cleanup.
>
> + for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
> + unsigned long pfn;
> +
> + pfn = get_pte_pfn(pte[i], pvmw->vma, addr);
> + if (pfn == -1)
> + continue;
> +
> + if (!pte_young(pte[i]))
> + continue;
> +
> + folio = get_pfn_folio(pfn, memcg, pgdat);
> + if (!folio)
> + continue;
> +
> + if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
> + continue;
> +
You have already checked that the PTE is old (not young), so this check
seems redundant. I do not see a way in which the access-bit can be cleared
since you hold the ptl. IOW, there is no need for the “if" and “continue".
Makes me also wonder whether having a separate ptep_clear_young() can
slightly help, since anyhow the access-bit is more of an estimation,
and having a separate ptep_clear_young() can enable optimizations.
On x86, for instance, if the PTE is dirty, we may be able to clear the
access-bit without an atomic operation, which should be faster.
More information about the linux-arm-kernel
mailing list