[RFC PATCH 2/2] arm64: mm: add SMCCC-backed cache invalidate provider
Dan Williams (nvidia)
djbw at kernel.org
Thu May 21 13:10:20 PDT 2026
Srirangan Madhavan wrote:
> Add an arm64 cache maintenance backend that discovers SMCCC cache
> clean+invalidate support, queries attributes, handles transient BUSY and
> RATE_LIMITED responses with bounded retries, and registers with the generic
> cache coherency framework.
>
> Signed-off-by: Srirangan Madhavan <smadhavan at nvidia.com>
> ---
> MAINTAINERS | 1 +
> arch/arm64/mm/Makefile | 1 +
> arch/arm64/mm/cache_maint.c | 180 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 182 insertions(+)
> create mode 100644 arch/arm64/mm/cache_maint.c
[..]
> +static int arm64_smccc_cache_wbinv(struct cache_coherency_ops_inst *cci,
> + struct cc_inval_params *invp)
> +{
> + struct arm64_smccc_cache *cache =
> + container_of(cci, struct arm64_smccc_cache, cci);
> + struct arm_smccc_res res = {};
> + int delay_us = smccc_cache_delay_us(cache);
> + u64 gen = 0;
> + s32 status;
> + int ret;
> + int i;
> +
> + if (!invp->size)
> + return -EINVAL;
> +
> + if (cache->global_op)
> + gen = READ_ONCE(cache->global_flush_gen);
> +
> + guard(mutex)(&cache->lock);
> +
> + /*
> + * If firmware reports a global operation type, a successful operation
> + * covers every request that was already waiting behind it. Skip if the
> + * generation advanced while this request was waiting to enter the
> + * serialized firmware call path.
> + */
> + if (cache->global_op && gen != READ_ONCE(cache->global_flush_gen))
> + return 0;
Hmm, this looks like it could under flush which is worse than over
flushing. The ordering is:
CPU0 CPU1
<dirty>
flush_gen==0
lock
flush_gen==0
flush <dirty>
flush_gen++ flush_gen==0
lock
flush_gen==1
skip
I.e. if CPU1 is racing dirtying while CPU0 is still flushing, then there
is a window for CPU1 to read the updated flush_gen and skip when it
needs to follow on with a new flush cycle. So this either needs a more
sophisticated queue / batch system to track which requests might get
satisfied while waiting for a turn, or just drop the optimization until
it is clear it causes a problem in practice.
I think dropping the optimization is practical for now.
> +
> + for (i = 0; i < SMCCC_CACHE_MAX_RETRIES; i++) {
> + /* Long firmware operations can trigger watchdog checks. */
> + touch_nmi_watchdog();
> +
> + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_CLEAN_INV_MEMREGION,
> + invp->addr, invp->size, 0UL, &res);
> + status = (s32)res.a0;
> + ret = smccc_cache_status_to_errno(status);
> + if (!ret) {
> + if (cache->global_op) {
> + WRITE_ONCE(cache->global_flush_gen,
> + cache->global_flush_gen + 1);
> + }
> + return 0;
> + }
> +
> + if (ret != -EBUSY && ret != -EAGAIN)
> + return ret;
I notice that cxl_region_invalidate_memregion() only expects failures to
find a flush capability, not failures to execute a flush.
Just a note to circle back to this concern.
More information about the linux-arm-kernel
mailing list