[PATCH v5 0/6] Add hardware prefetch control driver for A64FX and x86

'Greg KH' gregkh at linuxfoundation.org
Tue Jun 14 05:34:43 PDT 2022


On Tue, Jun 14, 2022 at 11:55:39AM +0000, tarumizu.kohei at fujitsu.com wrote:
> Thanks for the comment.
> 
> > Why does userspace want to even do this?
> 
> This is because the optimal settings may differ from application to
> application.

That's not ok.  Linux is a "general purpose" operating system and needs
to work well for all applications.  Doing application-specific-tuning
based on the specific hardware like this is a nightmare for users, and
will be for you as you will now have to support this specific model to
work correctly on all future kernel releases for the next 20+ years.
Are you willing to do that?

> Examples of performance improvements for applications with simple
> memory access characteristics are described in [merit] section.
> However, some applications have complex characteristics, so it is
> difficult to predict if an application will improve without actually
> trying it out.

Then perhaps it isn't anything that they should try out :)

Shouldn't the kernel know how the application works (based on the
resources it asks for) and tune itself based on that automatically?

If not, how is a user supposed to know how to do this?

> This is not necessary for all applications. However, I want to provide
> as a minimal interface that can be used by those who want to improve
> their application even a little.
> 
> > How will they do this?
> 
> I assume to be used to tune a specific core and execute an application
> on that core. The execution example is as follows.
> 
> 1) The user tunes the parameters of a specific core before executing
>    the program.
> 
> ```
> # echo 1024 > /sys/devices/system/cpu/cpu12/cache/index0/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu12/cache/index2/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu13/cache/index0/prefetch_control/stream_detect_prefetcher_dist
> # echo 1024 > /sys/devices/system/cpu/cpu13/cache/index2/prefetch_control/stream_detect_prefetcher_dist
> ```

What is "1024" here?  Where is any of this documented?  And why these
specific sysfs files and not others?

> 2) Execute the program bound to the target core.
> 
> ```
> # taskset -c 12-13 a.out
> ```
> 
> If the interface is exposed, the user can develop a library to execute
> 1) and 2) operation instead.

If you have no such user today, nor a library, how do you know any of
this works well?  And again, how will you support this going forward?
Or is this specific api only going to be for one specific piece of
hardware and never any future ones?

> > What programs will do this?
> 
> It is assumed to be used by programs that execute many continuous
> memory access. It may be useful for other applications, but I can't
> explain them in detail right away.

So you haven't tested this on any real applications?  We need real users
before being able to add new apis.  Otherwise we can just remove the
apis :)

> > And why isn't just automatic and why does this hardware require manual
> > intervention to work properly?
> 
> It is difficult for the hardware to determine the optimal parameters
> in advance. Therefore, I think that the register is provided to change
> the behavior of the hardware.

Kernel programming for a general purpose operating system is hard, but
it is possible :)

good luck!

greg k-h



More information about the linux-arm-kernel mailing list