[PATCH 1/2] iommu: Fix race condition during default domain allocation

Fri Jun 11 11:30:25 PDT 2021

> > +     mutex_lock(&group->mutex);
> >       iommu_alloc_default_domain(group, dev);
> > +     mutex_unlock(&group->mutex);
> 
> It feels wrong to serialise this for everybody just to cater for systems with
> aliasing SIDs between devices.

Serialization is limited to devices in the same group. Unless devices share SID, they wouldn't be in same group.

> Can you provide some more information about exactly what the h/w
> configuration is, and the callstack which exhibits the race, please?

The failure is an after effect and is a page fault.  Don't have a failure call stack here. Ashish has traced it through print messages and he can provide them.

>From the prints messages, The following was observed in page fault case:

Device1:  iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain)
Device2:  iommu_probe_device() --> iommu_alloc_default_domain() --> iommu_group_alloc_default_domain() --> __iommu_attach_device(group->default_domain)

Both devices(with same SID) are entering into iommu_group_alloc_default_domain() function and each one getting attached to a different group->default_domain 
as the second one overwrites  group->default_domain after the first one attaches to group->default_domain it has created.

SMMU would be setup to use first domain for the context page table. Whereas all the dma map/unamp requests from second device would
be performed on a domain that is not used by SMMU for context translations and IOVA (not mapped in first domain) accesses from second device lead to page faults. 

-KR