[PATCH v14 07/14] mm: multi-gen LRU: exploit locality in rmap

Nadav Amit nadav.amit at gmail.com
Thu Sep 1 02:18:10 PDT 2022



> On Aug 15, 2022, at 12:13 AM, Yu Zhao <yuzhao at google.com> wrote:
> 
> Searching the rmap for PTEs mapping each page on an LRU list (to test
> and clear the accessed bit) can be expensive because pages from
> different VMAs (PA space) are not cache friendly to the rmap (VA
> space). For workloads mostly using mapped pages, searching the rmap
> can incur the highest CPU cost in the reclaim path.

Impressive work. Sorry if my feedback is not timely.

Just one minor point for thought, that can be left for a later cleanup.

> 
> +	for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
> +		unsigned long pfn;
> +
> +		pfn = get_pte_pfn(pte[i], pvmw->vma, addr);
> +		if (pfn == -1)
> +			continue;
> +
> +		if (!pte_young(pte[i]))
> +			continue;
> +
> +		folio = get_pfn_folio(pfn, memcg, pgdat);
> +		if (!folio)
> +			continue;
> +
> +		if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
> +			continue;
> +

You have already checked that the PTE is old (not young), so this check
seems redundant. I do not see a way in which the access-bit can be cleared
since you hold the ptl. IOW, there is no need for the “if" and “continue".

Makes me also wonder whether having a separate ptep_clear_young() can
slightly help, since anyhow the access-bit is more of an estimation,
and having a separate ptep_clear_young() can enable optimizations.

On x86, for instance, if the PTE is dirty, we may be able to clear the
access-bit without an atomic operation, which should be faster.




More information about the linux-arm-kernel mailing list