[PATCH] KVM: arm64: Add KVM_CAP_ARM_NATIVE_CACHE_CONFIG vcpu capability

David Woodhouse dwmw2 at infradead.org
Thu Apr 9 10:49:09 PDT 2026


On Thu, 2026-04-09 at 18:07 +0100, Marc Zyngier wrote:
> On Thu, 09 Apr 2026 16:29:06 +0100,
> David Woodhouse <dwmw2 at infradead.org> wrote:
> > 
> > [1  <text/plain; UTF-8 (quoted-printable)>]
> > From: David Woodhouse <dwmw at amazon.co.uk>
> > 
> > Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration")
> > fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real
> > hardware values. While this provides consistent values across
> > heterogeneous CPUs, it does cause visible changes in the CPU model
> > exposed to guests.
> > 
> > The commit claims that userspace can restore the original values, but
> > there is no way for userspace to obtain the real CLIDR_EL1 register
> > value — it is not fully reconstructible from sysfs, which lacks the
> > LoC, LoUU, and LoUIS fields.
> > 
> > Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads
> > the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical
> > CPU and sets them on the vcpu.
> > 
> > This allows hypervisors to present the real hardware cache configuration
> > to guests, which is important for consistency of the environment across
> > kernel versions and for migration compatibility with hosts running
> > older kernels that exposed the real values.
> > 
> > Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
> > Signed-off-by: David Woodhouse <dwmw at amazon.co.uk>
> > ---
> >  Documentation/virt/kvm/api.rst                | 23 ++++++++
> >  arch/arm64/include/asm/kvm_host.h             |  1 +
> >  arch/arm64/kvm/arm.c                          | 17 ++++++
> >  arch/arm64/kvm/sys_regs.c                     | 26 ++++++++++
> >  include/uapi/linux/kvm.h                      |  1 +
> >  tools/testing/selftests/kvm/Makefile.kvm      |  1 +
> >  .../selftests/kvm/arm64/native_cache_config.c | 52 +++++++++++++++++++
> >  7 files changed, 121 insertions(+)
> >  create mode 100644 tools/testing/selftests/kvm/arm64/native_cache_config.c
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index e3b3bd9edeec..ee47dc07ceac 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -8930,6 +8930,29 @@ no-op.
> >  
> >  ``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can be disabled.
> >  
> > +7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG
> > +-------------------------------------
> > +
> > +:Architecture: arm64
> > +:Target: vcpu
> > +:Parameters: none
> > +:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if
> > +          args[0] or flags are non-zero.
> > +
> > +This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 values
> > +from the physical CPU on which the ioctl is executed, and sets them on
> > +the vcpu. This replaces the fabricated cache configuration that KVM
> > +provides by default.
> > +
> > +The caller should ensure the vcpu thread is pinned to the desired
> > +physical CPU before invoking this capability, so that the correct cache
> > +topology is captured. On heterogeneous systems, different physical CPUs
> > +may have different cache configurations.
> > +
> > +After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1
> > +values can still be overridden individually via ``KVM_SET_ONE_REG`` and
> > +the ``KVM_REG_ARM_DEMUX`` interface.
> > +
> >  8. Other capabilities.
> >  ======================
> >  
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index a1bb025c641f..c9713a472c47 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kvm);
> >  void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
> >  
> >  int __init kvm_sys_reg_table_init(void);
> > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu);
> >  struct sys_reg_desc;
> >  int __init populate_sysreg_config(const struct sys_reg_desc *sr,
> >  				  unsigned int idx);
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 326a99fea753..579583e8dc5c 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_ARM_DISABLE_EXITS:
> >  		r = KVM_ARM_DISABLE_VALID_EXITS;
> >  		break;
> > +	case KVM_CAP_ARM_NATIVE_CACHE_CONFIG:
> > +	case KVM_CAP_ENABLE_CAP:
> > +		r = 1;
> > +		break;
> >  	case KVM_CAP_SET_GUEST_DEBUG2:
> >  		return KVM_GUESTDBG_VALID_MASK;
> >  	case KVM_CAP_ARM_SET_DEVICE_ADDR:
> > @@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> >  		r = kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init);
> >  		break;
> >  	}
> > +	case KVM_ENABLE_CAP: {
> > +		struct kvm_enable_cap cap;
> > +
> > +		r = -EFAULT;
> > +		if (copy_from_user(&cap, argp, sizeof(cap)))
> > +			break;
> > +
> > +		r = -EINVAL;
> > +		if (cap.cap == KVM_CAP_ARM_NATIVE_CACHE_CONFIG &&
> > +		    !cap.args[0] && !cap.flags)
> > +			r = kvm_vcpu_set_native_cache_config(vcpu);
> > +		break;
> > +	}
> >  	case KVM_SET_ONE_REG:
> >  	case KVM_GET_ONE_REG: {
> >  		struct kvm_one_reg reg;
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 1b4cacb6e918..c19d84e48f8b 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32 csselr, u32 val)
> >  	return 0;
> >  }
> >  
> > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu)
> > +{
> > +	u32 csselr;
> > +
> > +	if (!vcpu->arch.ccsidr) {
> > +		vcpu->arch.ccsidr = kmalloc_array(CSSELR_MAX, sizeof(u32),
> > +						  GFP_KERNEL_ACCOUNT);
> > +		if (!vcpu->arch.ccsidr)
> > +			return -ENOMEM;
> > +	}
> 
> Well, no.
> 
> The moment you decide to expose all of the host's crap, you really
> need to put everything on the table. It means fully handling
> FEAT_CCIDX, which we were careful not to expose anywhere because it is
> a terrible idea.

The intent here is not to "expose all of the host's crap", but to
maintain compatibility with what the kernel did before commit
7af0c2534f4c. No need to expose FEAT_CCIDX.

> > +	for (csselr = 0; csselr < CSSELR_MAX; csselr++) {
> > +		write_sysreg(csselr, csselr_el1);
> > +		isb();
> > +		vcpu->arch.ccsidr[csselr] = read_sysreg(ccsidr_el1);
> 
> That's not how the selection register works. CLIDR_EL1 tells you what
> each cache level is (Instructions, Data, Unified, Tags), and that must
> be combined with the index (which doesn't start at bit 0).

Ack, thanks. I'll rework that based on the old is_valid_cache()
function.

> I also wonder how you reconcile not exposing MTE when the cache
> hierarchy indicate support for tags. That clearly contradicts "report
> what the HW has".

If that was an issue then it would already have been an issue before
commit 7af0c2534f4 (and in kernels with that commit reverted), hosting
millions of guests today.

This isn't about introducing *new* behaviour; it's about allowing the
existing established behaviour to be maintained so that we can have a
*managed* transition to the new model (for new launches) rather than an
unconditional uncontrolled change as the kernel gets upgraded.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5069 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20260409/19f9a82d/attachment-0001.p7s>


More information about the linux-arm-kernel mailing list