[PATCH] arm64: clear_page[s] using memset
Linus Walleij
linusw at kernel.org
Wed Apr 8 00:47:20 PDT 2026
On Tue, Apr 7, 2026 at 3:47 PM Catalin Marinas <catalin.marinas at arm.com> wrote:
> On Tue, Apr 07, 2026 at 11:25:55AM +0200, Linus Walleij wrote:
> > Quoting my own commit message hehe:
> >
> > > No performance regressions can be seen, the fastpath
> > > benchmarks differences are in the noise.
> >
> > This was tested on hardware with Ryan Robert's fastpath tool.
>
> BTW, have you tried the perf bench mmap test again with the new
> clear_page? Both with single page and multiple pages scenarios. And
> ideally on more than one platform.
>
> Will pointed out (in a private chat) that current clear_page() uses
> non-temporal stores while memset() doesn't. It may not make any
> difference in practice but it would be good to have some numbers.
Hm interesting point, the perf bench mmap test isn't specifically
in fastpath but since it tends to come up I guess we can add it?
Ryan: is it easy to add this test to fastpath? Or easy for me to do
myself? I looked at the instructions but they were a bit intimidating...
The test is the following:
We boot the kernel with cmdline like this:
"default_hugepagesz=1G hugepagesz=1G hugepages=32" to make sure
we have ample hugepages. This was then tested with the same
cmdline as the original series:
perf bench mem mmap -p 1GB -f demand -s 32GB -l 5
The first run was discarded as the memory hierarchy is cold on
the first run. Then I ran the above command 5 times and averaged
the throughput
The x86 commit cb431accb36e51b64ce34b5cc4d5ed292895fd84
also mentions this test:
perf bench mem memset -k 1GB -f default -s 16GB
I tried it on QEMU, no real benefits with either the previous or this
patch, and no regressions either. (x86 passes -f x86-64-stosq
which are some optimized memset instructions)
Yours,
Linus Walleij
More information about the linux-arm-kernel
mailing list