[PATCH rc] iommu/arm-smmu-v3: Do not use GFP_KERNEL under as spinlock

Thu Feb 8 06:27:29 PST 2024

On Thu, Feb 08, 2024 at 12:58:47PM +0000, Will Deacon wrote:
> On Fri, Feb 02, 2024 at 10:28:05AM -0400, Jason Gunthorpe wrote:
> > If the SMMU is configured to use a two level CD table then
> > arm_smmu_write_ctx_desc() allocates a CD table leaf internally using
> > GFP_KERNEL. Due to recent changes this is being done under a spinlock to
> > iterate over the device list - thus it will trigger a sleeping while
> > atomic warning:
> > 
> >   arm_smmu_sva_set_dev_pasid()
> >     mutex_lock(&sva_lock);
> >     __arm_smmu_sva_bind()
> >      arm_smmu_mmu_notifier_get()
> >       spin_lock_irqsave()
> >       arm_smmu_write_ctx_desc()
> > 	arm_smmu_get_cd_ptr()
> >          arm_smmu_alloc_cd_leaf_table()
> > 	  dmam_alloc_coherent(GFP_KERNEL)
> > 
> > This is a 64K high order allocation and really should not be done
> > atomically.
> > 
> > The proper answer is to sequence CD programming in a way that does not
> > require the device list or a spinlock. Part two of my lager series does
> > this and already solves this bug.
> > 
> > In the mean time we can solve the functional issue by preallocating the CD
> > entry outside any locking. CD leafs are effectively never freed so this
> > guarantees the arm_smmu_write_ctx_desc() inside the lock will not take the
> > allocation path.
> > 
> > Fixes: 24503148c545 ("iommu/arm-smmu-v3: Refactor write_ctx_desc")
> > Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
> > Closes: https://lore.kernel.org/all/4e25d161-0cf8-4050-9aa3-dfa21cd63e56@moroto.mountain/
> > Signed-off-by: Jason Gunthorpe <jgg at nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 +++++
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c     | 2 +-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h     | 2 ++
> >  3 files changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > index 05722121f00e70..068d60732e6481 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > @@ -588,6 +588,11 @@ static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
> >  {
> >  	int ret = 0;
> >  	struct mm_struct *mm = domain->mm;
> > +	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +
> > +	/* Preallocate the required leafs outside locks */
> > +	if (!arm_smmu_get_cd_ptr(master, id))
> > +		return -ENOMEM;
> 
> What guarantees that all the masters in the domain share the same l2 CD
> table?

Nothing, in fact they don't.

However, a multi-master RID domain is something that doesn't fully
work with SVA today. This code is functionally/security wrong to
install the PASID into all the masters that share the RID domain.

In practice that doesn't happen because the iommu group size when
PASID is enabled is restricted to 1 and the DMA API is the thing that
has created a unique per-group domain to be able to turn SVA on at
all.

So, the RID domain's list has one master entry that is the same as the
set_dev_pasid's master.

Which is why this patch works, it prealloctes the only master that
matters for the only configuration where SVA works and is used today.

iommufd could violate these assumptions which is part of why I'm
fixing all of this in preparation to enable PASID support for all
drivers in iommufd.

> The only reason we need the spinlock is to walk over the device
> list, so if that's not necessary then we could push this down into
> arm_smmu_mmu_notifier_get().

Pushing stuff down into arm_smmu_mmu_notifier_get() is the wrong
direction. CD manipulation needs to be lifted up and out into the
generic PASID level, not pushed down toward more SVA specific code.

This patch is where I got rid of the device list iteration:

https://lore.kernel.org/linux-iommu/11-v4-e7091cdd9e8d+43b1-smmuv3_newapi_p2_jgg@nvidia.com/

By lifting the CD programming up and then into the developing generic
PASID layer.

It is worth to jump ahead to the end, once part 2 is done the
set_dev_pasid SVA op looks like this:

static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
				      struct device *dev, ioasid_t id)
{
	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
	struct arm_smmu_cd target;
	int ret;

	/* Prevent arm_smmu_mm_release from being called while we are attaching */
	if (!mmget_not_zero(domain->mm))
		return -EINVAL;

	/*
	 * This does not need the arm_smmu_asid_lock because SVA domains never
	 * get reassigned
	 */
	arm_smmu_make_sva_cd(&target, master, smmu_domain->domain.mm,
			     smmu_domain->asid,
			     smmu_domain->btm_invalidation);

	ret = arm_smmu_set_pasid(master, smmu_domain, id, &target);

	mmput(domain->mm);
	return ret;
}

Where the cd programming is in arm_smmu_set_pasid() and that code is
fully shared between attaching a normal S1 paging domain and a SVA to
a PASID.

Jason