[PATCH v1 15/16] kvm: arm64: Allow configuring physical address space size

Fri Feb 9 00:16:06 PST 2018

On Thu, Feb 08, 2018 at 05:53:17PM +0000, Suzuki K Poulose wrote:
> On 08/02/18 11:14, Christoffer Dall wrote:
> >On Tue, Jan 09, 2018 at 07:04:10PM +0000, Suzuki K Poulose wrote:
> >>Allow the guests to choose a larger physical address space size.
> >>The default and minimum size is 40bits. A guest can change this
> >>right after the VM creation, but before the stage2 entry page
> >>tables are allocated (i.e, before it registers a memory range
> >>or maps a device address). The size is restricted to the maximum
> >>supported by the host. Also, the guest can only increase the PA size,
> >>from the existing value, as reducing it could break the devices which
> >>may have verified their physical address for validity and may do a
> >>lazy mapping(e.g, VGIC).
> >>
> >>Cc: Marc Zyngier <marc.zyngier at arm.com>
> >>Cc: Christoffer Dall <cdall at linaro.org>
> >>Cc: Peter Maydell <peter.maydell at linaro.org>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose at arm.com>
> >>---
> >>  Documentation/virtual/kvm/api.txt | 27 ++++++++++++++++++++++++++
> >>  arch/arm/include/asm/kvm_host.h   |  7 +++++++
> >>  arch/arm64/include/asm/kvm_host.h |  1 +
> >>  arch/arm64/include/asm/kvm_mmu.h  | 41 ++++++++++++++++++++++++++++++---------
> >>  arch/arm64/kvm/reset.c            | 28 ++++++++++++++++++++++++++
> >>  include/uapi/linux/kvm.h          |  4 ++++
> >>  virt/kvm/arm/arm.c                |  2 +-
> >>  7 files changed, 100 insertions(+), 10 deletions(-)
> >>
> >>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >>index 57d3ee9e4bde..a203faf768c4 100644
> >>--- a/Documentation/virtual/kvm/api.txt
> >>+++ b/Documentation/virtual/kvm/api.txt
> >>@@ -3403,6 +3403,33 @@ invalid, if invalid pages are written to (e.g. after the end of memory)
> >>  or if no page table is present for the addresses (e.g. when using
> >>  hugepages).
> >>+4.109 KVM_ARM_GET_PHYS_SHIFT
> >>+
> >>+Capability: KVM_CAP_ARM_CONFIG_PHYS_SHIFT
> >>+Architectures: arm64
> >>+Type: vm ioctl
> >>+Parameters: __u32 (out)
> >>+Returns: 0 on success, a negative value on error
> >>+
> >>+This ioctl is used to get the current maximum physical address size for
> >>+the VM. The value is Log2(Maximum_Physical_Address). This is neither the
> >>+ amount of physical memory assigned to the VM nor the maximum physical address
> >>+that a real CPU on the host can handle. Rather, this is the upper limit of the
> >>+guest physical address that can be used by the VM.
> >
> >What is the point of this?  Presumably if userspace has set the size, it
> >already knows the size?
> 
> This can help the userspace know, what the "default" limit is. As such I am
> not particular about keeping this around.
> 

Userspace has to already know, since otherwise things don't work today,
so I think we can omit this.

> >
> >>+
> >>+4.109 KVM_ARM_SET_PHYS_SHIFT
> >>+
> >>+Capability: KVM_CAP_ARM_CONFIG_PHYS_SHIFT
> >>+Architectures: arm64
> >>+Type: vm ioctl
> >>+Parameters: __u32 (in)
> >>+Returns: 0 on success, a negative value on error
> >>+
> >>+This ioctl is used to set the maximum physical address size for
> >>+the VM. The value is Log2(Maximum_Physical_Address). The value can only
> >>+be increased from the existing setting. The value cannot be changed
> >>+after the stage-2 page tables are allocated and will return an error.
> >>+
> >
> >Is there a way for userspace to discover what the underlying hardware
> >can actually support, beyond trial-and-error on this ioctl?
> 
> Unfortunately, there is none. We don't expose ID_AA64MMFR0 via mrs emulation.
> 

We should probably think about that.  Perhaps it could be tied to the
return value of KVM_CAP_ARM_CONFIG_PHYS_SHIFT ?

> >>+static inline int kvm_reconfig_stage2(struct kvm *kvm, u32 phys_shift)
> >>+{
> >>+	int rc = 0;
> >>+	unsigned int pa_max, parange;
> >>+
> >>+	parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 7;
> >>+	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
> >>+	/* Raise it to 40bits for backward compatibility */
> >>+	pa_max = (pa_max < 40) ? 40 : pa_max;
> >>+	/* Make sure the size is supported/available */
> >>+	if (phys_shift > PHYS_MASK_SHIFT || phys_shift > pa_max)
> >>+		return -EINVAL;
> >>+	/*
> >>+	 * The stage2 PGD is dependent on the settings we initialise here
> >>+	 * and should be allocated only after this step. We cannot allow
> >>+	 * down sizing the IPA size as there could be devices or memory
> >>+	 * regions, that depend on the previous size.
> >>+	 */
> >>+	mutex_lock(&kvm->slots_lock);
> >>+	if (kvm->arch.pgd || phys_shift < kvm->arch.phys_shift) {
> >>+		rc = -EPERM;
> >>+	} else if (phys_shift > kvm->arch.phys_shift) {
> >>+		kvm->arch.phys_shift = phys_shift;
> >>+		kvm->arch.s2_levels = stage2_pt_levels(kvm->arch.phys_shift);
> >>+		kvm->arch.vtcr_private = VTCR_EL2_SL0(kvm->arch.s2_levels) |
> >>+					 TCR_T0SZ(kvm->arch.phys_shift);
> >>+	}
> >
> >I think you can rework the above to make it more obvious what's going on
> >in this way:
> >
> >	rc = -EPERM;
> >	if (kvm->arch.pgd || phys_shift < kvm->arch.phys_shift)
> >		goto out_unlock;
> >
> >	rc = 0;
> >	if (phys_shift == kvm->arch.phys_shift)
> >		goto out_unlock;
> >
> >	kvm->arch.phys_shift = phys_shift;
> >	kvm->arch.s2_levels = stage2_pt_levels(kvm->arch.phys_shift);
> >	kvm->arch.vtcr_private = VTCR_EL2_SL0(kvm->arch.s2_levels) |
> >				 TCR_T0SZ(kvm->arch.phys_shift);
> >
> >out_unlock:
> >
> 
> Sure.
> 
> 
> 
> >>--- a/virt/kvm/arm/arm.c
> >>+++ b/virt/kvm/arm/arm.c
> >>@@ -1136,7 +1136,7 @@ long kvm_arch_vm_ioctl(struct file *filp,
> >>  		return 0;
> >>  	}
> >>  	default:
> >>-		return -EINVAL;
> >>+		return kvm_arch_dev_vm_ioctl(kvm, ioctl, arg);
> >>  	}
> >>  }
> >>-- 
> >>2.13.6
> >>
> >
> >Have you considered making this capability a generic capability and
> >encoding this in the 'type' argument to KVM_CREATE_VM?  This would
> >significantly simplify all the above and would allow you to drop patch 8
> >and 9 I think.
> 
> No. I will take a look. Btw, there were patches flying around to support
> "userspace" requesting specific values for ID feature registers. But even that
> doesn't help with the detection part. May be that is another way to configure
> the size, but not sure about the current status of that work.
> 

It's a bit stranded.  Drew was driving this work (on cc).  But the ID
register exposed to the guest should represent the actual limits
of the VM, so I don't think we need userspace to configure this, but we
can implement this in KVM based on the PA range configured for the VM.

Thanks,
-Christoffer