[PATCH v4 06/16] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev

Mostafa Saleh smostafa at google.com
Tue Feb 13 05:30:12 PST 2024


On Thu, Feb 01, 2024 at 09:24:43AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2024 at 12:15:53PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Thu, Jan 25, 2024 at 07:57:16PM -0400, Jason Gunthorpe wrote:
> > > The BTM support wants to be able to change the ASID of any smmu_domain.
> > > When it goes to do this it holds the arm_smmu_asid_lock and iterates over
> > > the target domain's devices list.
> > > 
> > > During attach of a S1 domain we must ensure that the devices list and
> > > CD are in sync, otherwise we could miss CD updates or a parallel CD update
> > > could push an out of date CD.
> > > 
> > > This is pretty complicated, and almost works today because
> > > arm_smmu_detach_dev() removes the master from the linked list before
> > > working on the CD entries, preventing parallel update of the CD.
> > > 
> > > However, it does have an issue where the CD can remain programed while the
> > > domain appears to be unattached. arm_smmu_share_asid() will then not clear
> > > any CD entriess and install its own CD entry with the same ASID
> > > concurrently. This creates a small race window where the IOMMU can see two
> > > ASIDs pointing to different translations.
> > 
> > I don’t see the race condition.
> > 
> > The current flow is as follows,
> > For SVA, if the asid was used by domain_x, it will do:
> > 
> > lock(arm_smmu_asid_lock)
> > Alloc new asid and set cd->asid.
> > lock(domain_x->devices_lock)
> > Write new CD with the new asid
> > unlock(domain_x->devices_lock)
> > unlock(arm_smmu_asid_lock)
> > 
> > For attach_dev (domain_y), if the device was attached to domain_z
> > //Detach old domain
> > lock(domain_z->devices_lock)
> > Remove master from old domain
> > unlock(domain_z->devices_lock)
> 
> At this moment all locks are dropped and the RID's CD entry continues
> to use the ASID.
> 
> The racing BTM flow now runs and will do your above:
> 
> arm_smmu_mmu_notifier_get()
>  arm_smmu_alloc_shared_cd()
>   arm_smmu_share_asid():
>     arm_smmu_update_ctx_desc_devices() <<- Does nothing due to list_del above
>     arm_smmu_tlb_inv_asid() <<-- Woops, we are invalidating an ASID that is still in a CD!
>  arm_smmu_write_ctx_desc() <<-- Install a new translation on a PASID's CD
> 
> Now the HW can observe two installed CDs using the same ASID but they
> point to different translations. This is illegal.
> 
> > Clear CD
> 
> Now we remove the RID CD, but it is too late, the PASID CD is already
> installed.
> 
> ASID/VMID lifecycle must be strictly contained to ensure the cache
> remains coherent:
> 
> 1. All programmed STE/CDs using the ASID/VMID must always point to the
>    same translation
> 
> 2. All references to a ASID/VMID must be removed from their STE/CDs
>    before the ASID is flushed
> 
> 3. The ASID/VMID must be flushed before it is assigned to a STE/CD
>    with a new translation.
> 
> We solve this by requiring that the arm_smmu_asid_lock must be held
> such that the smmu_domains->devices list AND the actual content of the
> CD tables are always observed to be consistent.
> 
> Jason

I see, thanks a lot for the detailed explanation. 
Maybe this can be added to the change log, so it’s documented somewhere.

Also, I guess this is mainly theoretical, as it requires the detached device to
issue DMA while being detached?

Thanks,
Mostafa



More information about the linux-arm-kernel mailing list