[RFC PATCH 2/2] arm64: mm: add SMCCC-backed cache invalidate provider

Thu May 21 13:10:20 PDT 2026

Srirangan Madhavan wrote:
> Add an arm64 cache maintenance backend that discovers SMCCC cache
> clean+invalidate support, queries attributes, handles transient BUSY and
> RATE_LIMITED responses with bounded retries, and registers with the generic
> cache coherency framework.
> 
> Signed-off-by: Srirangan Madhavan <smadhavan at nvidia.com>
> ---
>  MAINTAINERS                 |   1 +
>  arch/arm64/mm/Makefile      |   1 +
>  arch/arm64/mm/cache_maint.c | 180 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 182 insertions(+)
>  create mode 100644 arch/arm64/mm/cache_maint.c
[..]
> +static int arm64_smccc_cache_wbinv(struct cache_coherency_ops_inst *cci,
> +				   struct cc_inval_params *invp)
> +{
> +	struct arm64_smccc_cache *cache =
> +		container_of(cci, struct arm64_smccc_cache, cci);
> +	struct arm_smccc_res res = {};
> +	int delay_us = smccc_cache_delay_us(cache);
> +	u64 gen = 0;
> +	s32 status;
> +	int ret;
> +	int i;
> +
> +	if (!invp->size)
> +		return -EINVAL;
> +
> +	if (cache->global_op)
> +		gen = READ_ONCE(cache->global_flush_gen);
> +
> +	guard(mutex)(&cache->lock);
> +
> +	/*
> +	 * If firmware reports a global operation type, a successful operation
> +	 * covers every request that was already waiting behind it. Skip if the
> +	 * generation advanced while this request was waiting to enter the
> +	 * serialized firmware call path.
> +	 */
> +	if (cache->global_op && gen != READ_ONCE(cache->global_flush_gen))
> +		return 0;

Hmm, this looks like it could under flush which is worse than over
flushing. The ordering is:

CPU0			CPU1
<dirty>
flush_gen==0
lock
flush_gen==0
flush			<dirty>	
flush_gen++		flush_gen==0	
			lock
			flush_gen==1
			skip

I.e. if CPU1 is racing dirtying while CPU0 is still flushing, then there
is a window for CPU1 to read the updated flush_gen and skip when it
needs to follow on with a new flush cycle. So this either needs a more
sophisticated queue / batch system to track which requests might get
satisfied while waiting for a turn, or just drop the optimization until
it is clear it causes a problem in practice.

I think dropping the optimization is practical for now.

> +
> +	for (i = 0; i < SMCCC_CACHE_MAX_RETRIES; i++) {
> +		/* Long firmware operations can trigger watchdog checks. */
> +		touch_nmi_watchdog();
> +
> +		arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_CLEAN_INV_MEMREGION,
> +				     invp->addr, invp->size, 0UL, &res);
> +		status = (s32)res.a0;
> +		ret = smccc_cache_status_to_errno(status);
> +		if (!ret) {
> +			if (cache->global_op) {
> +				WRITE_ONCE(cache->global_flush_gen,
> +					   cache->global_flush_gen + 1);
> +			}
> +			return 0;
> +		}
> +
> +		if (ret != -EBUSY && ret != -EAGAIN)
> +			return ret;

I notice that cxl_region_invalidate_memregion() only expects failures to
find a flush capability, not failures to execute a flush.

Just a note to circle back to this concern.