[PATCH v3 3/4] KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attribute
Marc Zyngier
maz at kernel.org
Thu Jan 6 10:16:04 PST 2022
On Thu, 06 Jan 2022 11:54:11 +0000,
Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>
> Hi Marc,
>
> On Tue, Dec 14, 2021 at 12:28:15PM +0000, Marc Zyngier wrote:
> > On Mon, 13 Dec 2021 15:23:08 +0000,
> > Alexandru Elisei <alexandru.elisei at arm.com> wrote:
> > >
> > > When KVM creates an event and there are more than one PMUs present on the
> > > system, perf_init_event() will go through the list of available PMUs and
> > > will choose the first one that can create the event. The order of the PMUs
> > > in the PMU list depends on the probe order, which can change under various
> > > circumstances, for example if the order of the PMU nodes change in the DTB
> > > or if asynchronous driver probing is enabled on the kernel command line
> > > (with the driver_async_probe=armv8-pmu option).
> > >
> > > Another consequence of this approach is that, on heteregeneous systems,
> > > all virtual machines that KVM creates will use the same PMU. This might
> > > cause unexpected behaviour for userspace: when a VCPU is executing on
> > > the physical CPU that uses this PMU, PMU events in the guest work
> > > correctly; but when the same VCPU executes on another CPU, PMU events in
> > > the guest will suddenly stop counting.
> > >
> > > Fortunately, perf core allows user to specify on which PMU to create an
> > > event by using the perf_event_attr->type field, which is used by
> > > perf_init_event() as an index in the radix tree of available PMUs.
> > >
> > > Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU
> > > attribute to allow userspace to specify the arm_pmu that KVM will use when
> > > creating events for that VCPU. KVM will make no attempt to run the VCPU on
> > > the physical CPUs that share this PMU, leaving it up to userspace to
> > > manage the VCPU threads' affinity accordingly.
> > >
> > > Setting the PMU for a VCPU is an all of nothing affair to avoid exposing an
> > > asymmetric system to the guest: either all VCPUs have the same PMU, either
> > > none of the VCPUs have a PMU set. Attempting to do something in between
> > > will result in an error being returned when doing KVM_ARM_VCPU_PMU_V3_INIT.
> > >
> > > Signed-off-by: Alexandru Elisei <alexandru.elisei at arm.com>
> > > ---
> > >
> > > Checking that all VCPUs have the same PMU is done when the PMU is
> > > initialized because setting the VCPU PMU is optional, and KVM cannot know
> > > what the user intends until the KVM_ARM_VCPU_PMU_V3_INIT ioctl, which
> > > prevents further changes to the VCPU PMU. vcpu->arch.pmu.created has been
> > > changed to an atomic variable because changes to the VCPU PMU state now
> > > need to be observable by all physical CPUs.
> > >
> > > Documentation/virt/kvm/devices/vcpu.rst | 30 ++++++++-
> > > arch/arm64/include/uapi/asm/kvm.h | 1 +
> > > arch/arm64/kvm/pmu-emul.c | 88 ++++++++++++++++++++-----
> > > include/kvm/arm_pmu.h | 4 +-
> > > tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
> > > 5 files changed, 104 insertions(+), 20 deletions(-)
> > >
> > > [..]
> > > -static u32 kvm_pmu_event_mask(struct kvm *kvm)
> > > +static u32 kvm_pmu_event_mask(struct kvm_vcpu *vcpu)
> > > {
> > > - switch (kvm->arch.pmuver) {
> > > + unsigned int pmuver;
> > > +
> > > + if (vcpu->arch.pmu.arm_pmu)
> > > + pmuver = vcpu->arch.pmu.arm_pmu->pmuver;
> > > + else
> > > + pmuver = vcpu->kvm->arch.pmuver;
> >
> > This puzzles me throughout the whole patch. Why is the arm_pmu pointer
> > a per-CPU thing? I would absolutely expect it to be stored in the kvm
> > structure, making the whole thing much simpler.
>
> Reply below.
>
> >
> > > [..]
> > > @@ -637,8 +645,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx)
> > > return;
> > >
> > > memset(&attr, 0, sizeof(struct perf_event_attr));
> > > - attr.type = PERF_TYPE_RAW;
> > > - attr.size = sizeof(attr);
> >
> > Why is this line removed?
>
> Typo on my part, thank you for spotting it.
>
> >
> > > [..]
> > > @@ -910,7 +922,16 @@ static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu)
> > > init_irq_work(&vcpu->arch.pmu.overflow_work,
> > > kvm_pmu_perf_overflow_notify_vcpu);
> > >
> > > - vcpu->arch.pmu.created = true;
> > > + atomic_set(&vcpu->arch.pmu.created, 1);
> > > +
> > > + kvm_for_each_vcpu(i, v, kvm) {
> > > + if (!atomic_read(&v->arch.pmu.created))
> > > + continue;
> > > +
> > > + if (v->arch.pmu.arm_pmu != arm_pmu)
> > > + return -ENXIO;
> > > + }
> >
> > If you did store the arm_pmu at the VM level, you wouldn't need this.
> > You could detect the discrepancy in the set_pmu ioctl.
>
> I chose to set at the VCPU level to be consistent with how KVM treats the
> PMU interrupt ID when the interrupt is a PPI, where the interrupt ID must
> be the same for all VCPUs and it is stored at the VCPU. However, looking at
> the code again, it occurs to me that it is stored at the VCPU when it's a
> PPI because it's simpler to do it that way, as the code remains the same
> when the interrupt ID is a SPI, which must be *different* between VCPUs. So
> in the end, having the PMU stored at the VM level does match how KVM uses
> it, which looks to be better than my approach.
>
> This is the change you proposed in your branch [1]:
>
> +static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
> +{
> + struct kvm *kvm = vcpu->kvm;
> + struct arm_pmu_entry *entry;
> + struct arm_pmu *arm_pmu;
> + int ret = -ENXIO;
> +
> + mutex_lock(&kvm->lock);
> + mutex_lock(&arm_pmus_lock);
> +
> + list_for_each_entry(entry, &arm_pmus, entry) {
> + arm_pmu = entry->arm_pmu;
> + if (arm_pmu->pmu.type == pmu_id) {
> + /* Can't change PMU if filters are already in place */
> + if (kvm->arch.arm_pmu != arm_pmu &&
> + kvm->arch.pmu_filter) {
> + ret = -EBUSY;
> + break;
> + }
> +
> + kvm->arch.arm_pmu = arm_pmu;
> + ret = 0;
> + break;
> + }
> + }
> +
> + mutex_unlock(&arm_pmus_lock);
> + mutex_unlock(&kvm->lock);
> + return ret;
> +}
>
> As I understand the code, userspace only needs to call
> KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) *once* (on one VCPU
> fd) to set the PMU for all the VCPUs; subsequent calls (on the same VCPU or
> on another VCPU) with a different PMU id will change the PMU for all VCPUs.
>
> Two remarks:
>
> 1. The documentation for the VCPU ioctls states this (from
> Documentation/virt/kvm/devices/vcpu.rst):
>
> "
> ======================
> Generic vcpu interface
> ======================
>
> The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
> KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct
> kvm_device_attr as other devices, but **targets VCPU-wide settings and
> controls**" (emphasis added).
>
> But I guess having VCPU ioctls affect *only* the VCPU hasn't really been
> true ever since PMU event filtering has been added. I'll send a patch to
> change that part of the documentation for arm64.
>
> I was thinking maybe a VM capability would be better suited for changing a
> VM-wide setting, what do you think? I don't have a strong preference either
> way.
I'm not sure it is worth the hassle of changing the API, as we'll have
to keep the current one forever.
>
> 2. What's to stop userspace to change the PMU after at least one VCPU has
> run? That can be easily observed by the guest when reading PMCEIDx_EL0.
That's a good point. We need something here. It is a bit odd as to do
that, you need to fully enable a PMU on one CPU, but not on the other,
then run the first while changing stuff on the other. Something along
those lines (untested):
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4bf28905d438..4f53520e84fd 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -139,6 +139,7 @@ struct kvm_arch {
/* Memory Tagging Extension enabled for the guest */
bool mte_enabled;
+ bool ran_once;
};
struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 83297fa97243..3045d7f609df 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -606,6 +606,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
vcpu->arch.has_run_once = true;
+ mutex_lock(&kvm->lock);
+ kvm->arch.ran_once = true;
+ mutex_unlock(&kvm->lock);
+
kvm_arm_vcpu_init_debug(vcpu);
if (likely(irqchip_in_kernel(kvm))) {
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index dfc0430d6418..95100c541244 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -959,8 +959,9 @@ static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id)
arm_pmu = entry->arm_pmu;
if (arm_pmu->pmu.type == pmu_id) {
/* Can't change PMU if filters are already in place */
- if (kvm->arch.arm_pmu != arm_pmu &&
- kvm->arch.pmu_filter) {
+ if ((kvm->arch.arm_pmu != arm_pmu &&
+ kvm->arch.pmu_filter) ||
+ kvm->arch.ran_once) {
ret = -EBUSY;
break;
}
@@ -1040,6 +1041,11 @@ int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
mutex_lock(&vcpu->kvm->lock);
+ if (vcpu->kvm->arch.ran_once) {
+ mutex_unlock(&vcpu->kvm->lock);
+ return -EBUSY;
+ }
+
if (!vcpu->kvm->arch.pmu_filter) {
vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT);
if (!vcpu->kvm->arch.pmu_filter) {
which should prevent both PMU or filters to be changed once a single
vcpu as run.
Thoughts?
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list