[PATCH 04/19] iommu/arm-smmu-v3: Make STE programming independent of the callers

Jason Gunthorpe jgg at nvidia.com
Fri Jan 12 11:45:52 PST 2024


On Sat, Jan 06, 2024 at 04:50:48PM +0800, Michael Shavit wrote:
> > I'm fine with this, if you think it is better please sort out the rest
> > of the bits and send me a diff and I'll integrate it
> >
> > Thanks,
> > Jason
> 
> Integrated and re-sent the 3 relevant patches; although git-send-email
> gave two of them a different subject so they may appear as a different
> thread depending on your email client.
> 
> Please note that I didn't update the commit description on the two
> patches that you initially wrote. Also note that the Kunit test patch
> does not yet add tests for any CD updates (outside of generic logic
> which is shared with STE update). I'm on vacation for the next week
> and haven't had a chance to expand the test coverage.

Okay, I took care of it all and got the branch rebased onto something
closer to what v6.8-rc1 will look like. I made a bunch of cosmetic
changes and checked the unit test still works. I'll post the three
parts to the list when v6.8-rc1 comes out in a weeks time.

The update is on my github:

https://github.com/jgunthorpe/linux/commits/smmuv3_newapi/

Here is the rewritten commit message:

iommu/arm-smmu-v3: Make STE programming independent of the callers

As the comment in arm_smmu_write_strtab_ent() explains, this routine has
been limited to only work correctly in certain scenarios that the caller
must ensure. Generally the caller must put the STE into ABORT or BYPASS
before attempting to program it to something else.

The iommu core APIs would ideally expect the driver to do a hitless change
of iommu_domain in a number of cases:

 - RESV_DIRECT support wants IDENTITY -> DMA -> IDENTITY to be hitless
   for the RESV ranges

 - PASID upgrade has IDENTIY on the RID with no PASID then a PASID paging
   domain installed. The RID should not be impacted

 - PASID downgrade has IDENTIY on the RID and all PASID's removed.
   The RID should not be impacted

 - RID does PAGING -> BLOCKING with active PASID, PASID's should not be
   impacted

 - NESTING -> NESTING for carrying all the above hitless cases in a VM
   into the hypervisor. To comprehensively emulate the HW in a VM we should
   assume the VM OS is running logic like this and expecting hitless updates
   to be relayed to real HW.

For CD updates arm_smmu_write_ctx_desc() has a similar comment explaining
how limited it is, and the driver does have a need for hitless CD updates:

 - SMMUv3 BTM S1 ASID re-label

 - SVA mm release should change the CD to answert not-present to all
   requests without allowing logging (EPD0)

The next patches/series are going to start removing some of this logic
from the callers, and add more complex state combinations than currently.
At the end everything that can be hitless will be hitless, including all
of the above.

Introduce arm_smmu_write_entry() which will run through the multi-qword
programming sequence to avoid creating an incoherent 'torn' STE in the HW
caches. It automatically detects which of two algorithms to use:

1) The disruptive V=0 update described in the spec which disrupts the
   entry and does three syncs to make the change:
       - Write V=0 to QWORD 0
       - Write the entire STE except QWORD 0
       - Write QWORD 0

2) A hitless update algorithm that follows the same rational that the driver
   already uses. It is safe to change IGNORED bits that HW doesn't use:
       - Write the target value into all currently unused bits
       - Write a single QWORD, this makes the new STE live atomically
       - Ensure now unused bits are 0

The detection of which path to use and the implementation of the hitless
update rely on a "used bitmask" describing what bits the HW is actually
using based on the V/CFG/etc bits. This flows from the spec language,
typically indicated as IGNORED.

Knowing which bits the HW is using we can update the bits it does not use
and then compute how many QWORDS need to be changed. If only one qword
needs to be updated the hitless algorithm is possible.

Later patches will include CD updates in this mechanism so make the
implementation generic using a struct arm_smmu_entry_writer and struct
arm_smmu_entry_writer_ops to abstract the differences between STE and CD
to be plugged in.

At this point it generates the same sequence of updates as the current
code, except that zeroing the VMID on entry to BYPASS/ABORT will do an
extra sync (this seems to be an existing bug).

Going forward this will use a V=0 transition instead of cycling through
ABORT if a hitfull change is required. This seems more appropriate as ABORT
will fail DMAs without any logging, but dropping a DMA due to transient
V=0 is probably signaling a bug, so the C_BAD_STE is valuable.



More information about the linux-arm-kernel mailing list