[RFC PATCH 00/10] Add Fujitsu A64FX soc entry/hardware barrier driver

misono.tomohiro at fujitsu.com misono.tomohiro at fujitsu.com
Tue Jan 12 05:24:48 EST 2021


Hi, 

First of all, thanks a lot for all the comments to both of you (cont. below).

> On Fri, Jan 8, 2021 at 1:54 PM Mark Rutland <mark.rutland at arm.com> wrote:
> > On Fri, Jan 08, 2021 at 07:52:31PM +0900, Misono Tomohiro wrote:
> > > (Resend as cover letter title was missing in the first time. Sorry for noise)
> > >
> > > This series adds Fujitsu A64FX SoC entry in drivers/soc and hardware
> > > barrier driver for it.
> > >
> > > [Driver Description]
> > >  A64FX CPU has several functions for HPC workload and hardware barrier
> > >  is one of them. It is a mechanism to realize fast synchronization by
> > >  PEs belonging to the same L3 cache domain by using implementation
> > >  defined hardware registers.
> > >  For more details, see A64FX HPC extension specification in
> > >  https://github.com/fujitsu/A64FX
> > >
> > >  The driver mainly offers a set of ioctls to manipulate related registers.
> > >  Patch 1-9 implements driver code and patch 10 finally adds kconfig,
> > >  Makefile and MAINTAINER entry for the driver.
> >
> > I have a number of concerns here, and at a high level, I do not think
> > that this is something Linux can reasonably support in its current form.
> > Sorry if this comes across as harsh; I appreciate the work that has gone
> > into this, and the effort to try to upstream support is great -- my
> > concerns are with the overal picture.
> >
> > As a general rule, we avoid the use of IMPLEMENTATION DEFINED features
> > in Linux, as they pose a number of correctness/safety challenges and
> > come with a potentially significan long term maintenance burden that is
> > generally not justified by the features themselves. For example, such
> > features are not usable under virtualization (where a hypervisor may set
> > HCR_EL2.TIDCP, or fail to context-switch state that it is unaware of).
> 
> I am somewhat less concerned about the feature being implementation
> defined than I am about adding a custom user interface for one
> platform.
> 
> In the end, anything outside of the CPU core that ends up in a SoC
> is implementation defined, and this is usually not a problem as long
> as we have an abstraction in the kernel that hides the details from
> the user, and the system is still functional if the implementation is
> turned off for whatever reason.

Understood. However, I don't know any other processors having similar
features at this point and it is hard to provide common abstraction interface.
I would appreciate should anyone have any information.

> > Secondly, the intended usage model appears to expose this to EL0 for
> > direct access, and the code seems to depend on threads being pinned, but
> > AFAICT this is not enforced and there is no provision for
> > context-switch, thread migration, or interaction with ptrace. I fear
> > this is going to be very fragile in practice, and that extending that
> > support in future will require much more complexity than is currently
> > apparent, with potentially invasive changes to arch code.
> 
> Right, this is the main problem I see, too. I had not even realized
> that this will have to tie in with user space threads in some form, but
> you are right that once this has to interact with the CPU scheduler,
> it all breaks down.

This observation is right. I thought adding context switch etc. support for 
implementation defined registers requires core arch code changes and 
it is far less acceptable. So, I tried to confine code change in a module with 
these restrictions. 

Regarding direct access from EL0, it is necessary for realizing fast synchronization 
as this enables synchronization logic in user application check if all threads have
reached at synchronization point without switching to kernel.
Also, It is common usage that each running thread is bound to one PE in multi-threaded 
HPC applications.

> One way I can imagine this working out is to tie into the cpuset
> mechanism that is used for isolating threads to CPU cores, and
> then provide a cpuset interface that has the desired behavior
> but that can fall back to a generic implementation with the same
> or stronger (but normally slower) semantics.

I'm not sure if this approach is feasible, but I will try to look into it.

> > Thirdly, this requires userspace software to be intimately familiar with
> > the HW platform that it is running on (both in terms of using IMP-DEF
> > instructions and needing to know the physical layout), rather than being
> > generic and portable, which I don't believe is something that we wish to
> > encourage.  I also think this is unlikely to be supported by generic
> > software because of the lack of portability, and consequently I struggle
> > to beleive that this will see significant usage.
> 
> Agreed as well.

It may be possible to trap access to these implementation defined registers 
and fallback some logic in the driver. The problem is that other processors 
might use the same IMP-DEF registers for different purpose.

Regards,
Tomohiro


More information about the linux-arm-kernel mailing list