[RFC PATCH 19/45] KVM: arm64: iommu: Add domains

Jean-Philippe Brucker jean-philippe at linaro.org
Fri Jun 2 08:29:07 PDT 2023


Hi Mostafa,

On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa at google.com> wrote:
>
> Hi Jean,
>
> On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > +/*
> > + * Serialize access to domains and IOMMU driver internal structures (command
> > + * queue, device tables)
> > + */
> > +static hyp_spinlock_t iommu_lock;
> > +
> I was looking more into this lock and I think we can make it per IOMMU instead
> of having one big lock to avoid congestion, as I see it is only used to
> protect per-IOMMU resources.

Yes it's a major bottleneck, thanks for looking into this. I think we'll
eventually want to improve scalability within an IOMMU and even a domain:
a multi-queue network device will share a single domain between multiple
CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
working on different CPUs and sharing one IOMMU. In those cases the
per-IOMMU lock won't be good enough and we'll be back to this problem, but
a per-IOMMU lock should already improve scalability for some systems.

Currently the global lock protects:

(1) The domain array (per IOMMU)
(2) The io-pgtables (per domain)
(3) The command queue (per SMMU)
(4) Power state (per SMMU)

I ran some experiments with refcounting the domains to avoid the lock for
(1) and (2), which improves map() scalability. See
https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
(and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)

For (2), the io-pgtable-arm code already supports concurrent accesses
(2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
this one needs careful consideration because in the host, the io-pgtable
code trusts the device drivers. For example it expects that only buggy
drivers call map()/unmap() to the same IOVA concurrently. We need to make
sure a compromised host drivers can't exploit these assumptions to do
anything nasty.

There are options to improve (3) as well. The host SMMU driver supports
lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
contention during command-queue insertion")) but it may be too complex to
put in the hypervisor (or rather, I haven't tried to understand it yet).
We could also wait for systems with ECMDQ, which enables per-CPU queues.

> Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> Maybe something like this:
>
> diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> index b257a3b4bfc5..96d30a37f9e6 100644
> --- a/arch/arm64/kvm/hyp/hyp-constants.c
> +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> @@ -3,11 +3,13 @@
>  #include <linux/kbuild.h>
>  #include <nvhe/memory.h>
>  #include <nvhe/pkvm.h>
> +#include <nvhe/spinlock.h>
>
>  int main(void)
>  {
>         DEFINE(STRUCT_HYP_PAGE_SIZE,    sizeof(struct hyp_page));
>         DEFINE(PKVM_HYP_VM_SIZE,        sizeof(struct pkvm_hyp_vm));
>         DEFINE(PKVM_HYP_VCPU_SIZE,      sizeof(struct pkvm_hyp_vcpu));
> +       DEFINE(HYP_SPINLOCK_SIZE,       sizeof(hyp_spinlock_t));
>         return 0;
>  }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index a90b97d8bae3..cf9195e24a08 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>
>  obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
>  arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
>  arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> index ab888da731bc..82827b99b1ed 100644
> --- a/include/kvm/iommu.h
> +++ b/include/kvm/iommu.h
> @@ -5,6 +5,12 @@
>  #include <asm/kvm_host.h>
>  #include <kvm/power_domain.h>
>  #include <linux/io-pgtable.h>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +#include <nvhe/spinlock.h>
> +#else
> +#include "hyp_constants.h"
> +#endif
>
>  /*
>   * Parameters from the trusted host:
> @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
>
>         struct io_pgtable_params        *pgtable;
>         bool                            power_is_off;
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +       hyp_spinlock_t                  iommu_lock;
> +#else
> +       u8 unused[HYP_SPINLOCK_SIZE];
> +#endif
>  };
>
>  struct kvm_hyp_iommu_memcache {
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 1f4d5fcc1386..afaf173e65ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -14,12 +14,6 @@
>
>  struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
>
> -/*
> - * Serialize access to domains and IOMMU driver internal structures (command
> - * queue, device tables)
> - */
> -static hyp_spinlock_t iommu_lock;
> -
>  #define domain_to_iopt(_iommu, _domain, _domain_id)            \
>         (struct io_pgtable) {                                   \
>                 .ops = &(_iommu)->pgtable->ops,                 \
> @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
>         domain->refs = 1;
>         domain->pgd = iopt.pgd;
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
>
> @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>  {
>         int ret = -EINVAL;
>         struct io_pgtable iopt;
> -       struct kvm_hyp_iommu *iommu;
> +       struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
>         struct kvm_hyp_iommu_domain *domain;
>
> -       hyp_spin_lock(&iommu_lock);
> +       hyp_spin_lock(&iommu->iommu_lock);
>         domain = handle_to_domain(iommu_id, domain_id, &iommu);
>         if (!domain)
>                 goto out_unlock;
> @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
>         memset(domain, 0, sizeof(*domain));
>
>  out_unlock:
> -       hyp_spin_unlock(&iommu_lock);
> +       hyp_spin_unlock(&iommu->iommu_lock);
>         return ret;
>  }
> --
>
> (I didn't include full patch as it is too long, but mainly the
> rest is s/&iommu_lock/&iommu->iommu_lock)
>
> Please let me know what do you think?

It makes sense, I can append it to my tree or squash it (with your S-o-B)
if you want to send the full patch

Thanks,
Jean



More information about the linux-arm-kernel mailing list