[PATCH v4 0/6] Cache coherency management subsystem

Thu Oct 23 05:31:36 PDT 2025

On Wed, 22 Oct 2025 12:22:41 -0700
Andrew Morton <akpm at linux-foundation.org> wrote:

> On Wed, 22 Oct 2025 12:33:43 +0100 Jonathan Cameron <Jonathan.Cameron at huawei.com> wrote:
> 
> > Support system level interfaces for cache maintenance as found on some
> > ARM64 systems. This is needed for correct functionality during various
> > forms of memory hotplug (e.g. CXL). Typical hardware has MMIO interface
> > found via ACPI DSDT.
> > 
> > Includes parameter changes to cpu_cache_invalidate_memregion() but no
> > functional changes for architectures that already support this call.  

Hi Andrew,

> 
> I see additions to lib/ so presumably there is an expectation that
> other architectures might use this.

Absolutely. It's not ARM specific in any way. Given, in at least some
implementations, it is part of the coherency fabric and there are
examples in the past of people mixing and matching those with CPU
architectures, it's more than possible a given driver might be applicable
across different CPU architectures.

> 
> Please expand on this.  Any particular architectures in mind?  Any
> words of wisdom which maintainers of those architectures might benefit
> from?

My initial guess for a second architecture using it would be RiscV
but I don't know if anyone yet cares about the any of the use cases.

The short answer is that it depends on whether the architecture
requires 'one solution' or leaves it as a system problem where
a driver needs to be loaded to suit the particular implementation.

Longer answer follows:

There are two aspects to when people might find this useful to
consider

A) The use case.  For it to apply to an architecture you need to have
   a requirement to support the case of content of memory presented
   at a PA to change without the host explicitly writing it.  That
   can happen for various reasons.
   - Late exposure of memory - security keys for pmem for instance.
     Until those are programmed any prefetchers will fill caches
     with garbage that needs clearing out.
   - Reprogramming of address decoders beyond the edge of where the
     Host Physical Addresses define what goes on.  This is the CXL
     case where there is a translation from Host Physical Address
     to Device Physical address which can change at runtime.
   - (not yet enabled) Interhost sharing without hw coherency. Necessary
     to flush local caches because someone changed the data under the
     hood. Because this happened beyond the scope of the local host
     normal cache flushing instructions might not do the job.
     Hopefully we will have lighter weight solutions for this.
So upshot today is that it is likely to only apply to server architectures.

B) Is there an architected solution for that architecture. (i.e. is it
   in the CPU architecture spec) If there is 'one solution', then
   registering the arch callbacks directly is sufficient. This is
   true for x86 as there is a CPU instruction that performances the
   relevant operations.

Arm decided (for now) to not go down the path of architecting this
in one of their architecture specs that licensees would then have
to comply with (I'll let James / others add more on that if they want).
There were already being multiple hardware IPs out there that providing
this feature as part of the coherency fabrics.  Earlier versions of
this series mentioned an attempt to provide a firmware interface to
hide away the complexity but that also turned out to be unnecessary
as everyone with a usecase had memory mapped devices the kernel can
directly control.

So there will be multiple different implementations on ARM servers.
I doubt we'll even keep it completely consistent across different
HiSilicon CPU generations. As per the discussion with Conor, there
are multiple agents each of which registers separately and has
no knowledge of the other instances. For now the ones I know of
are homogeneous for a given server, but it made no difference to
allow for heterogeneous cases (I emulated those to check).

So for other architectures, it is a case of which path do they want to
follow?  If they don't have existing instructions defined that work
for this, and have more than one implementer, then the approach seen
here should be useful. I think RiscV doesn't have such an instruction
so I'd expect this to be useful to them.  Not sure on other server
architectures as most of them today are much less diverse than ARM / RiscV
so a "one true solution" in an architecture spec is perhaps more likely.

In the various review rounds, we've had some discussion of the requirements
implied by the current simple interface (no ordering, single operation in
flight).  So I'd not be surprised if we have to make things a little
cleverer in the long run.  The HiSilicon HHA hardware interface is very simple
so I've supported what that (and the PSCI spec with sane options - see v3)
for now.

> 
> > How to merge?  When this is ready to proceed (so subject to review
> > feedback on this version), I'm not sure what the best route into the
> > kernel is. Conor could take the lot via his tree for drivers/cache but
> > the generic changes perhaps suggest it might be better if Andrew
> > handles this?  Any merge conflicts in drivers/cache will be trivial
> > build file stuff. Or maybe even take it throug one of the affected
> > trees such as CXL.  
> 
> Let's not split the series up.  Either CXL or COnor's tree is fine my
> me.

Thanks,

Jonathan
> 
>