[PATCH v3 3/8] lib: Support ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION

Wed Oct 8 09:45:51 PDT 2025

On Mon, 8 Sep 2025 13:59:29 -0700
dan.j.williams at intel.com wrote:

> Jonathan Cameron wrote:
> > From: Yicong Yang <yangyicong at hisilicon.com>
> > 
> > ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION provides the mechanism for
> > invalidating certain memory regions in a cache-incoherent manner. Currently
> > this is used by NVDIMM and CXL memory drivers in cases where it is
> > necessary to flush all data from caches by physical address range.
> > 
> > In some architectures these operations are supported by system components
> > that may become available only later in boot as they are either present
> > on a discoverable bus, or via a firmware description of an MMIO interface
> > (e.g. ACPI DSDT). Provide a framework to handle this case.
> > 
> > Architectures can opt in for this support via
> > CONFIG_GENERIC_CPU_CACHE_MAINTENANCE
> > 
> > Add a registration framework. Each driver provides an ops structure and
> > the first op is Write Back and Invalidate by PA Range. The driver may
> > over invalidate.
> > 
> > An optional completion check operation is also provided. If present
> > that should be called to ensure that the action has finished.
> > 
> > When multiple agents are present in the system each should register with
> > this framework and the core code will issue the invalidate to all of them
> > before checking for completion on each. This is done to avoid need for
> > filtering in the core code which can become complex when interleave,
> > potentially across different cache coherency hardware is going on, so it
> > is easier to tell everyone and let those who don't care do nothing.
> > 
> > Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron at huawei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron at huawei.com>
> > ---
> > v3: Squash all the layering from v2 so that the infrastucture is
> >     always present.
> >     Suggestions on naming welcome. Note that the hardware I have
> >     available supports a much richer set of maintenance operations
> >     than Write Back and Invalidate, so I'd like a name that
> >     covers all resonable maintenance operations.
> >     Use an allocation wrapper macro, based on the fwctl one to
> >     ensure that the first element of the allocated driver structure
> >     is a struct cache_coherency_device.
> >     Thanks to all who provided feedback.
> > ---
> >  include/linux/cache_coherency.h |  57 ++++++++++++++
> >  lib/Kconfig                     |   3 +
> >  lib/Makefile                    |   2 +
> >  lib/cache_maint.c               | 128 ++++++++++++++++++++++++++++++++
> >  4 files changed, 190 insertions(+)
> > 
> > diff --git a/include/linux/cache_coherency.h b/include/linux/cache_coherency.h
> > new file mode 100644
> > index 000000000000..cb195b17b6e6
> > --- /dev/null
> > +++ b/include/linux/cache_coherency.h
> > @@ -0,0 +1,57 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Cache coherency maintenace operation device drivers
> > + *
> > + * Copyright Huawei 2025
> > + */
> > +#ifndef _LINUX_CACHE_COHERENCY_H_
> > +#define _LINUX_CACHE_COHERENCY_H_
> > +
> > +#include <linux/list.h>
> > +#include <linux/types.h>
> > +
> > +struct cc_inval_params {
> > +	phys_addr_t addr;
> > +	size_t size;
> > +};
> > +
> > +struct cache_coherency_device;
> > +
> > +struct coherency_ops {
> > +	int (*wbinv)(struct cache_coherency_device *ccd, struct cc_inval_params *invp);
> > +	int (*done)(struct cache_coherency_device *ccd);
> > +};
> > +
> > +struct cache_coherency_device {
> > +	struct list_head node;
> > +	const struct coherency_ops *ops;
> > +};  
> 
> Why is this called a device when there is no 'struct device'?
> 
> This is just 'cache_coherency_ops'.

That's fair. The device went away as Greg KH quite reasonably didn't like the
idea of a struct device with no userspace ABI at all.

I'll change the various register / unregister to use terminology

cache_coherency_ops_instance_register() etc to make it clear it
isn't just a register one global set of ops.

> 
> Are you sure that this structure does not need something like "priority" or
> "level" indicator to know where the ops should be sorted in a list? Or is
> it the responsibility of the arch to make sure that the registration order
> of the ops structures follows the hierarchy order of the caches?

For all known implementations where we actually need this (so hosts with CXL
or similar) the implementation is in a device somewhere on the coherency fabric
that is capable of causing appropriate invalidation messages to be issued
to all caches to the point where it knows that it there are no copies in
the wrong state anywhere. In a simple model an offload agent has grabbed
exclusive ownership of the line and written the content to memory.

The multiple 'device' support is about different cachelines being the
responsibility of different cache flushing 'devices' (interleave, multiple
sockets etc), not a single line being flushed from different places.

The PSCI spec alpha (that never went further) did allow for a case where a
complex timing dance was needed but IIRC even that didn't assume an ordering
constraint across the various devices.  It envisioned a stop world situation
where all fetches were disabled until the line was definitely flushed by everyone.
Thankfully we don't know of any implementation that needs that.

We might need to extend things in future, but for now no ordering needed.

> > diff --git a/lib/cache_maint.c b/lib/cache_maint.c
> > new file mode 100644
> > index 000000000000..05d9c5e99941
> > --- /dev/null
> > +++ b/lib/cache_maint.c
> > @@ -0,0 +1,128 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Generic support for Memory System Cache Maintenance operations.
> > + *
> > + * Coherency maintenance drivers register with this simple framework that will
> > + * iterate over each registered instance to first kick off invalidation and
> > + * then to wait until it is complete.
> > + *
> > + * If no implementations are registered yet cpu_cache_has_invalidate_memregion()
> > + * will return false. If this runs concurrently with unregistration then a
> > + * race exists but this is no worse than the case where the device responsible
> > + * for a given memory region has not yet registered.
> > + */
> > +#include <linux/cache_coherency.h>
> > +#include <linux/cleanup.h>
> > +#include <linux/container_of.h>
> > +#include <linux/export.h>
> > +#include <linux/list.h>
> > +#include <linux/memregion.h>
> > +#include <linux/module.h>
> > +#include <linux/rwsem.h>
> > +#include <linux/slab.h>
> > +
> > +static LIST_HEAD(cache_device_list);
> > +static DECLARE_RWSEM(cache_device_list_lock);
> > +
> > +void cache_coherency_device_free(struct cache_coherency_device *ccd)
> > +{
> > +	kfree(ccd);
> > +}
> > +EXPORT_SYMBOL_GPL(cache_coherency_device_free);  
> 
> Why do you need a new GPL export wrapper for kfree?
As per your other comment this will become a kref_put() I think.
>