[RFC PATCH 19/45] KVM: arm64: iommu: Add domains
Jean-Philippe Brucker
jean-philippe at linaro.org
Fri Jun 2 08:29:07 PDT 2023
Hi Mostafa,
On Fri, 19 May 2023 at 16:33, Mostafa Saleh <smostafa at google.com> wrote:
>
> Hi Jean,
>
> On Wed, Feb 01, 2023 at 12:53:03PM +0000, Jean-Philippe Brucker wrote:
> > +/*
> > + * Serialize access to domains and IOMMU driver internal structures (command
> > + * queue, device tables)
> > + */
> > +static hyp_spinlock_t iommu_lock;
> > +
> I was looking more into this lock and I think we can make it per IOMMU instead
> of having one big lock to avoid congestion, as I see it is only used to
> protect per-IOMMU resources.
Yes it's a major bottleneck, thanks for looking into this. I think we'll
eventually want to improve scalability within an IOMMU and even a domain:
a multi-queue network device will share a single domain between multiple
CPUs, each issuing lots of map/unmap calls. Or just two devices drivers
working on different CPUs and sharing one IOMMU. In those cases the
per-IOMMU lock won't be good enough and we'll be back to this problem, but
a per-IOMMU lock should already improve scalability for some systems.
Currently the global lock protects:
(1) The domain array (per IOMMU)
(2) The io-pgtables (per domain)
(3) The command queue (per SMMU)
(4) Power state (per SMMU)
I ran some experiments with refcounting the domains to avoid the lock for
(1) and (2), which improves map() scalability. See
https://jpbrucker.net/git/linux/commit/?h=pkvm/smmu-wip&id=5ad3bc6fd589b6f11fe31ccee5b8777ba8739167
(and another experiment to optimize the DMA refcount on branch pkvm/smmu-wip)
For (2), the io-pgtable-arm code already supports concurrent accesses
(2c3d273eabe8 ("iommu/io-pgtable-arm: Support lockless operation")). But
this one needs careful consideration because in the host, the io-pgtable
code trusts the device drivers. For example it expects that only buggy
drivers call map()/unmap() to the same IOVA concurrently. We need to make
sure a compromised host drivers can't exploit these assumptions to do
anything nasty.
There are options to improve (3) as well. The host SMMU driver supports
lockless command-queue (587e6c10a7ce ("iommu/arm-smmu-v3: Reduce
contention during command-queue insertion")) but it may be too complex to
put in the hypervisor (or rather, I haven't tried to understand it yet).
We could also wait for systems with ECMDQ, which enables per-CPU queues.
> Some special handling needed as hyp_spinlock_t is not exposed to EL1.
> Maybe something like this:
>
> diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
> index b257a3b4bfc5..96d30a37f9e6 100644
> --- a/arch/arm64/kvm/hyp/hyp-constants.c
> +++ b/arch/arm64/kvm/hyp/hyp-constants.c
> @@ -3,11 +3,13 @@
> #include <linux/kbuild.h>
> #include <nvhe/memory.h>
> #include <nvhe/pkvm.h>
> +#include <nvhe/spinlock.h>
>
> int main(void)
> {
> DEFINE(STRUCT_HYP_PAGE_SIZE, sizeof(struct hyp_page));
> DEFINE(PKVM_HYP_VM_SIZE, sizeof(struct pkvm_hyp_vm));
> DEFINE(PKVM_HYP_VCPU_SIZE, sizeof(struct pkvm_hyp_vcpu));
> + DEFINE(HYP_SPINLOCK_SIZE, sizeof(hyp_spinlock_t));
> return 0;
> }
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index a90b97d8bae3..cf9195e24a08 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -6,6 +6,7 @@ arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
>
> obj-$(CONFIG_ARM_SMMU_V3_PKVM) += arm_smmu_v3_kvm.o
> +ccflags-$(CONFIG_ARM_SMMU_V3_PKVM) += -Iarch/arm64/kvm/
> arm_smmu_v3_kvm-objs-y += arm-smmu-v3-kvm.o
> arm_smmu_v3_kvm-objs-y += arm-smmu-v3-common.o
> arm_smmu_v3_kvm-objs := $(arm_smmu_v3_kvm-objs-y)
> diff --git a/include/kvm/iommu.h b/include/kvm/iommu.h
> index ab888da731bc..82827b99b1ed 100644
> --- a/include/kvm/iommu.h
> +++ b/include/kvm/iommu.h
> @@ -5,6 +5,12 @@
> #include <asm/kvm_host.h>
> #include <kvm/power_domain.h>
> #include <linux/io-pgtable.h>
> +#ifdef __KVM_NVHE_HYPERVISOR__
> +#include <nvhe/spinlock.h>
> +#else
> +#include "hyp_constants.h"
> +#endif
>
> /*
> * Parameters from the trusted host:
> @@ -23,6 +29,11 @@ struct kvm_hyp_iommu {
>
> struct io_pgtable_params *pgtable;
> bool power_is_off;
> +#ifdef __KVM_NVHE_HYPERVISOR__
> + hyp_spinlock_t iommu_lock;
> +#else
> + u8 unused[HYP_SPINLOCK_SIZE];
> +#endif
> };
>
> struct kvm_hyp_iommu_memcache {
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index 1f4d5fcc1386..afaf173e65ed 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -14,12 +14,6 @@
>
> struct kvm_hyp_iommu_memcache __ro_after_init *kvm_hyp_iommu_memcaches;
>
> -/*
> - * Serialize access to domains and IOMMU driver internal structures (command
> - * queue, device tables)
> - */
> -static hyp_spinlock_t iommu_lock;
> -
> #define domain_to_iopt(_iommu, _domain, _domain_id) \
> (struct io_pgtable) { \
> .ops = &(_iommu)->pgtable->ops, \
> @@ -93,10 +87,10 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> {
> int ret = -EINVAL;
> struct io_pgtable iopt;
> - struct kvm_hyp_iommu *iommu;
> + struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> struct kvm_hyp_iommu_domain *domain;
>
> - hyp_spin_lock(&iommu_lock);
> + hyp_spin_lock(&iommu->iommu_lock);
> domain = handle_to_domain(iommu_id, domain_id, &iommu);
> if (!domain)
> goto out_unlock;
> @@ -112,7 +106,7 @@ int kvm_iommu_alloc_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id,
> domain->refs = 1;
> domain->pgd = iopt.pgd;
> out_unlock:
> - hyp_spin_unlock(&iommu_lock);
> + hyp_spin_unlock(&iommu->iommu_lock);
> return ret;
> }
>
> @@ -120,10 +114,10 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> {
> int ret = -EINVAL;
> struct io_pgtable iopt;
> - struct kvm_hyp_iommu *iommu;
> + struct kvm_hyp_iommu *iommu = kvm_iommu_ops.get_iommu_by_id(iommu_id);
> struct kvm_hyp_iommu_domain *domain;
>
> - hyp_spin_lock(&iommu_lock);
> + hyp_spin_lock(&iommu->iommu_lock);
> domain = handle_to_domain(iommu_id, domain_id, &iommu);
> if (!domain)
> goto out_unlock;
> @@ -137,7 +131,7 @@ int kvm_iommu_free_domain(pkvm_handle_t iommu_id, pkvm_handle_t domain_id)
> memset(domain, 0, sizeof(*domain));
>
> out_unlock:
> - hyp_spin_unlock(&iommu_lock);
> + hyp_spin_unlock(&iommu->iommu_lock);
> return ret;
> }
> --
>
> (I didn't include full patch as it is too long, but mainly the
> rest is s/&iommu_lock/&iommu->iommu_lock)
>
> Please let me know what do you think?
It makes sense, I can append it to my tree or squash it (with your S-o-B)
if you want to send the full patch
Thanks,
Jean
More information about the linux-arm-kernel
mailing list