[RESEND PATCH v7 00/10] Small-sized THP for anonymous memory

Wed Nov 22 22:28:04 PST 2023

On 11/22/23 08:29, Ryan Roberts wrote:
...
> Prerequisites
> =============
> 
> Some work items identified as being prerequisites are listed on page 3 at [8].
> The summary is:
> 
> | item                          | status                  |
> |:------------------------------|:------------------------|
> | mlock                         | In mainline (v6.7)      |
> | madvise                       | In mainline (v6.6)      |
> | compaction                    | v1 posted [9]           |
> | numa balancing                | Investigated: see below |
> | user-triggered page migration | In mainline (v6.7)      |
> | khugepaged collapse           | In mainline (NOP)       |
> 
> On NUMA balancing, which currently ignores any PTE-mapped THPs it encounters,
> John Hubbard has investigated this and concluded that it is A) not clear at the
> moment what a better policy might be for PTE-mapped THP and B) questions whether
> this should really be considered a prerequisite given no regression is caused
> for the default "small-sized THP disabled" case, and there is no correctness
> issue when it is enabled - its just a potential for non-optimal performance.
> (John please do elaborate if I haven't captured this correctly!)

That's accurate. I actually want to continue looking into this (Mel
Gorman's recent replies to v6 provided helpful touchstones to the NUMA
reasoning leading up to the present day), and maybe at least bring
pte-thps into rough parity with THPs with respect to NUMA.

But that really doesn't seem like something that needs to happen first,
especially since the outcome might even be, "first, do no harm"--as in,
it's better as-is. We'll see.

> 
> If there are no disagreements about removing numa balancing from the list, then
> that just leaves compaction which is in review on list at the moment.
> 
> I really would like to get this series (and its remaining comapction
> prerequisite) in for v6.8. I accept that it may be a bit optimistic at this
> point, but lets see where we get to with review?
> 
> 
> Testing
> =======
> 
> The series includes patches for mm selftests to enlighten the cow and khugepaged
> tests to explicitly test with small-order THP, in the same way that PMD-order
> THP is tested. The new tests all pass, and no regressions are observed in the mm
> selftest suite. I've also run my usual kernel compilation and java script
> benchmarks without any issues.
> 
> Refer to my performance numbers posted with v6 [6]. (These are for small-sized
> THP only - they do not include the arm64 contpte follow-on series).
> 
> John Hubbard at Nvidia has indicated dramatic 10x performance improvements for
> some workloads at [10]. (Observed using v6 of this series as well as the arm64
> contpte series).
> 

Testing continues. Some workloads do even much better than than 10x,
it's quite remarkable and glorious to see. :)  I can send more perf data
perhaps in a few days or a week, if there is still doubt about the
benefits.

That was with the v6 series, though. I'm about to set up and run with
v7, and expect to provide a tested by tag for functionality, sometime
soon (in the next few days), if machine availability works out as
expected.

thanks,
-- 
John Hubbard
NVIDIA