[PATCH v2 11/19] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()

Jason Gunthorpe jgg at nvidia.com
Thu Nov 16 08:28:09 PST 2023


On Wed, Nov 15, 2023 at 11:15:23PM +0800, Michael Shavit wrote:
> On Tue, Nov 14, 2023 at 1:53 AM Jason Gunthorpe <jgg at nvidia.com> wrote:
> >
> > This was needed because the STE code required the STE to be in
> > ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> > can automatically handle all transitions we can remove this step
> > from the attach_dev flow.
> >
> > A few small bugs exist because of this:
> >
> > 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
> >    then there will be a moment where the STE points at BYPASS. Since
> >    this can be done by VFIO/IOMMUFD it is a small security race.
> >
> > 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
> >    regions will temporarily become BLOCKED. We'd like drivers to
> >    work in a way that allows IOMMU_RESV_DIRECT to be continuously
> >    functional during these transitions.
> >
> > Make arm_smmu_release_device() put the STE back to the correct
> > ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> > this path.
> >
> > Notice this subtly depends on the prior arm_smmu_asid_lock change as the
> > STE must be put to non-paging before removing the device for the linked
> > list to avoid races with arm_smmu_share_asid().
> 
> I'm a little confused by this comment. Is this suggesting that
> arm_smmu_detach_dev had a race condition before the arm_smmu_asid_lock
> changes, since it deletes the list entry before deactivating the STE
> that uses the domain and without grabbing the asid_lock, thus allowing
> a gap where the ASID might be re-acquired by an SVA domain while an
> STE with that ASID is still live on this device? Wouldn't that belong
> on the asid_lock patch instead if so?

I wasn't intending to say there is an existing bug, this was more to
point out why it was organized like this, and why it is OK to remove
the detach manipulation of the STE considering races with share_asid.

However, I agree that the code in rc1 is troubled and fixed in the
prior patch:

	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
	list_del(&master->domain_head);
	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);

^^^^ Prevents arm_smmu_update_ctx_desc_devices() from storing to the STE
     However the STE is still pointing at the ASID

	master->domain = NULL;
	master->ats_enabled = false;
	arm_smmu_install_ste_for_dev(master);

^^^^ Now the STE is gone, so the CD becomes unreferenced

	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);

^^^^ Now the CD is non-valid

I was primarily concerned with corrupting the CD, ie that share_asid
would race and un-clear the write_ctx_desc(). That is prevented by the
ordering above.

However, I agree the above is still problematic because there is a
short time window where the ASID can be installed in two CDs with two
different translations. I suppose there is a security issue where this
could corrupt the IOTLB.

This is all fixed in this series too by having more robust locking. So
this does deserve a note in the commit message for the earlier patch
about this issue.

> > @@ -2852,9 +2846,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> >  static void arm_smmu_release_device(struct device *dev)
> >  {
> >         struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +       struct arm_smmu_ste target;
> >
> >         if (WARN_ON(arm_smmu_master_sva_enabled(master)))
> >                 iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> > +
> > +       /* Put the STE back to what arm_smmu_init_strtab() sets */
> 
> Hmmmm, it seems like checking iommu->require_direct may put STEs in
> bypass in scenarios where arm_smmu_init_strtab() wouldn't have.
> arm_smmu_init_strtab is calling iort_get_rmr_sids to pick streams to
> put into bypass, but IIUC iommu->require_direct also applies to
> dts-based reserved-memory regions, not just iort.

Indeed, that actually looks like a little bug as the DT should
technicaly be the same behavior as the iort.. I'm going to ignore it
:)

> I'm not very familiar with the history behind disable_bypass; why is
> putting an entire stream into bypass the correct behavior if a
> reserved-memory (which may be for a small finite region) exists?

This specific reserved memory region is requesting a 1:1 translation
for a chunk of IOVA. This translation is being used by some agent
outside Linux's knowledge and the desire is for the translation to
always be in effect.

So, if we put the STE to ABORT then the translation will stop working
with unknown side effects.

This is also why we install the translation in the DMA domain and
block use of VIFO if these are set - to ensure the 1:1 translation is
always there.

Thanks,
Jason



More information about the linux-arm-kernel mailing list