[RFC PATCH v1 00/57] Boot-time page size selection for arm64
David Hildenbrand
david at redhat.com
Wed Oct 16 08:16:42 PDT 2024
> Performance Testing
> ===================
>
> I've run some limited performance benchmarks:
>
> First, a real-world benchmark that causes a lot of page table manipulation (and
> therefore we would expect to see regression here if we are going to see it
> anywhere); kernel compilation. It barely registers a change. Values are times,
> so smaller is better. All relative to base-4k:
>
> | | kern | kern | user | user | real | real |
> | config | mean | stdev | mean | stdev | mean | stdev |
> |-------------|---------|---------|---------|---------|---------|---------|
> | base-4k | 0.0% | 1.1% | 0.0% | 0.3% | 0.0% | 0.3% |
> | compile-4k | -0.2% | 1.1% | -0.2% | 0.3% | -0.1% | 0.3% |
> | boot-4k | 0.1% | 1.0% | -0.3% | 0.2% | -0.2% | 0.2% |
>
> The Speedometer JavaScript benchmark also shows no change. Values are runs per
> min, so bigger is better. All relative to base-4k:
>
> | config | mean | stdev |
> |-------------|---------|---------|
> | base-4k | 0.0% | 0.8% |
> | compile-4k | 0.4% | 0.8% |
> | boot-4k | 0.0% | 0.9% |
>
> Finally, I've run some microbenchmarks known to stress page table manipulations
> (originally from David Hildenbrand). The fork test maps/allocs 1G of anon
> memory, then measures the cost of fork(). The munmap test maps/allocs 1G of anon
> memory then measures the cost of munmap()ing it. The fork test is known to be
> extremely sensitive to any changes that cause instructions to be aligned
> differently in cachelines. When using this test for other changes, I've seen
> double digit regressions for the slightest thing, so 12% regression on this test
> is actually fairly good. This likely represents the extreme worst case for
> regressions that will be observed across other microbenchmarks (famous last
> words). Values are times, so smaller is better. All relative to base-4k:
>
... and here I am, worrying about much smaller degradation in these
micro-benchmark ;) You're right, these are pure micro-benchmarks, and
while 12% does sound like "much", even stupid compiler code movement can
result in such changes in the fork() micro benchmark.
So I think this is just fine, and actually "surprisingly" small. And,
there is even a way to statically compile a page size and not worry
about that at all.
As discussed ahead of times, I consider this change very valuable. In
RHEL, the biggest issue is actually the test matrix, that cannot really
be reduced significantly ... but it will make shipping/packaging easier.
CCing Don, who did the separate 64k RHEL flavor kernel.
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list