[PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86

Linus Walleij linus.walleij at linaro.org
Fri Jun 10 06:48:25 PDT 2022


On Tue, Jun 7, 2022 at 2:07 PM Kohei Tarumizu
<tarumizu.kohei at fujitsu.com> wrote:

> This patch series add sysfs interface to control CPU's hardware
> prefetch behavior for performance tuning from userspace for the
> processor A64FX and x86 (on supported CPU).

OK

> A64FX and some Intel processors have implementation-dependent register
> for controlling CPU's hardware prefetch behavior. A64FX has
> IMP_PF_STREAM_DETECT_CTRL_EL0[1], and Intel processors have MSR 0x1a4
> (MSR_MISC_FEATURE_CONTROL)[2].

Hardware prefetch (I guess of memory contents) is a memory hierarchy feature.

Linux has a memory hierarchy manager, conveniently named "mm",
developed by some of the smartest people I know. The main problem
addressed by that is paging, but prefetching into the CPU from the
next lowest level in the memory hierarchy is just another memory
hierarchy hardware feature, such as hard disks, primary RAM etc.

> These registers cannot be accessed from userspace.

Good. The kernel managed hardware. If the memory hierarchy people have
userspace now doing stuff behind their back, through some special
interface, that makes their world more complicated.

This looks like it needs information from the generic memory manager,
from the scheduler, and possibly all the way down from the block
layer to do the right thing, so it has no business in userspace.
Have you seen mm/damon for example? Access to statistics for
memory access patterns seems really useful for tuning the behaviour
of this hardware. Just my €0.01.

If it does interact with userspace I suppose it should be using control
groups, like everything else of this type, see e.g. mm/memcontrol.c,
not custom sysfs files.

Just an example from one of the patches:

+                       - "* Adjacent Cache Line Prefetcher Disable (R/W)"
+                           corresponds to the
"adjacent_cache_line_prefetcher_enable"

I might only be on "a little knowledge is dangerous" on the memory
manager topics, but I know for sure that they at times adjust the members
of structs to fit nicely on cache lines. And now this? It looks really useful
for kernel machinery that know very well what needs to go into the cache
line next and when.

Talk to the people on linux-mm and memory maintainer Andrew Morton on
how to do this right, it's a really interesting feature! Also given
that people say
that the memory hierarchy is an important part in the performance of the Apple
M1 (M2) silicon, I expect that machine to have this too?

Yours,
Linus Walleij



More information about the linux-arm-kernel mailing list