[PATCH v5 00/27] Update SMMUv3 to the modern iommu API (part 2/3)

Mostafa Saleh smostafa at google.com
Mon Mar 25 03:22:15 PDT 2024


Hi Jason,

On Mon, Mar 04, 2024 at 07:43:48PM -0400, Jason Gunthorpe wrote:
> Continuing the work of part 1 this focuses on the CD, PASID and SVA
> components:
> 
>  - attach_dev failure does not change the HW configuration.
> 
>  - Full PASID API support including:
>     - S1/SVA domains attached to PASIDs
>     - IDENTITY/BLOCKED/S1 attached to RID
>     - Change of the RID domain while PASIDs are attached
> 
>  - Streamlined SVA support using the core infrastructure
> 
>  - Hitless, whenever possible, change between two domains
> 
> Making the CD programming work like the new STE programming allows
> untangling some of the confusing SVA flows. From there the focus is on
> building out the core infrastructure for dealing with PASID and CD
> entries, then keeping track of unique SSID's for ATS invalidation.
> 
> The ATS ordering is generalized so that the PASID flow can use it and put
> into a form where it is fully hitless, whenever possible. Care is taken to
> ensure that ATC flushes are present after any change in translation.
> 
> Finally we simply kill the entire outdated SVA mmu_notifier implementation
> in one shot and switch it over to the newly created generic PASID & CD
> code. This avoids the messy and confusing approach of trying to
> incrementally untangle this in place. The new code is small and simple
> enough this is much better than trying to figure out smaller steps.
> 
> Once SVA is resting on the right CD code it is straightforward to make the
> PASID interface functionally complete.
> 
> It achieves the same goals as the several series from Michael and the S1DSS
> series from Nicolin that were trying to improve portions of the API.
> 
> This is on github:
> https://github.com/jgunthorpe/linux/commits/smmuv3_newapi

Testing on qemu[1], with the same VMM Shameer tested with[2]:
qemu/build/qemu-system-aarch64 -M virt -machine virt,gic-version=3,iommu=nested-smmuv3,iommufd=iommufd0 \
-cpu cortex-a53,pmu=off -smp 1 -m 2048 \
-kernel Image \
-drive file=rootfs.ext4,if=virtio,format=raw  \
-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -nographic  \
-append 'console=ttyAMA0 rootwait root=/dev/vda' \
-device virtio-scsi-pci,id=scsi0  \
-device ioh3420,id=pcie.1,chassis=1 \
-object iommufd,id=iommufd0 \
-device vfio-pci,host=0000:00:03.0,iommufd=iommufd0

I see the following panic:

[  155.141233] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  155.142416] Mem abort info:
[  155.142722]   ESR = 0x0000000086000004
[  155.143106]   EC = 0x21: IABT (current EL), IL = 32 bits
[  155.143827]   SET = 0, FnV = 0
[  155.144266]   EA = 0, S1PTW = 0
[  155.144721]   FSC = 0x04: level 0 translation fault
[  155.145432] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101059000
[  155.146234] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  155.148162] Internal error: Oops: 0000000086000004 [#1] PREEMPT SMP
[  155.149399] Modules linked in:
[  155.150366] CPU: 2 PID: 371 Comm: qemu-system-aar Not tainted 6.8.0-rc7-gde77230ac23a #9
[  155.151728] Hardware name: linux,dummy-virt (DT)
[  155.152770] pstate: 81400809 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=-c)
[  155.153895] pc : 0x0
[  155.154889] lr : iommufd_hwpt_invalidate+0xa4/0x204
[  155.156272] sp : ffff800080f3bcc0
[  155.156971] x29: ffff800080f3bcf0 x28: ffff0000c369b300 x27: 0000000000000000
[  155.158135] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[  155.159175] x23: 0000000000000000 x22: 00000000c1e334a0 x21: ffff0000c1e334a0
[  155.160343] x20: ffff800080f3bd38 x19: ffff800080f3bd58 x18: 0000000000000000
[  155.161298] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff8240d6d8
[  155.162355] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  155.163463] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[  155.164947] x8 : 0000001000000002 x7 : 0000fffeac1ec950 x6 : 0000000000000000
[  155.166057] x5 : ffff800080f3bd78 x4 : 0000000000000003 x3 : 0000000000000002
[  155.167343] x2 : 0000000000000000 x1 : ffff800080f3bcc8 x0 : ffff0000c6034d80
[  155.168851] Call trace:
[  155.169738]  0x0
[  155.170623]  iommufd_fops_ioctl+0x154/0x274
[  155.171555]  __arm64_sys_ioctl+0xac/0xf0
[  155.172095]  invoke_syscall+0x48/0x110
[  155.172633]  el0_svc_common.constprop.0+0x40/0xe0
[  155.173277]  do_el0_svc+0x1c/0x28
[  155.173847]  el0_svc+0x34/0xb4
[  155.174312]  el0t_64_sync_handler+0x120/0x12c
[  155.174969]  el0t_64_sync+0x190/0x194
[  155.176006] Code: ???????? ???????? ???????? ???????? (????????)
[  155.178349] ---[ end trace 0000000000000000 ]---

The core IOMMUFD code calls domain->ops->cache_invalidate_user
unconditionally from IOCTL:IOMMU_HWPT_INVALIDATE and the SMMUv3 driver
doesn't implement it, that seems missing as otherwise the VMM can't
invalidate S1 mappings, or I a missing something?


[1] https://lore.kernel.org/all/20240325101442.1306300-1-smostafa@google.com/
[2] https://github.com/nicolinc/qemu/commits/wip/iommufd_vsmmu-02292024/

> 
> v5:
>  - Rebase on v6.8-rc7 & Will's tree
>  - Accomdate the SVA rc patch removing the master list iteration
>  - Move the kfree(to_smmu_domain(domain)) hunk to the right patch
>  - Move S1DSS get_used hunk to "Allow IDENTITY/BLOCKED to be set while
>    PASID is used"
> v4: https://lore.kernel.org/r/0-v4-e7091cdd9e8d+43b1-smmuv3_newapi_p2_jgg@nvidia.com
>  - Rebase on v6.8-rc1, adjust to use mm_get_enqcmd_pasid() and eventually
>    remove all references from ARM. Move the new ARM_SMMU_FEAT_STALL_FORCE
>    stuff to arm_smmu_make_sva_cd()
>  - Adjust to use the new shared STE/CD writer logic. Disable some of the
>    sanity checks for the interior of the series
>  - Return ERR_PTR from domain_alloc functions
>  - Move the ATS disablement flow into arm_smmu_attach_prepare()/commit()
>    which lets all the STE update flows use the same sequence. This is
>    needed for nesting in part 3
>  - Put ssid in attach_state
>  - Replace to_smmu_domain_safe() with to_smmu_domain_devices()
> v3: https://lore.kernel.org/r/0-v3-9083a9368a5c+23fb-smmuv3_newapi_p2_jgg@nvidia.com
>  - Rebase on the latest part 1
>  - update comments and commit messages
>  - Fix error exit in arm_smmu_set_pasid()
>  - Fix inverted logic for btm_invalidation
>  - Add missing ATC invalidation on mm release
>  - Add a big comment explaining that BTM is not enabled and what is
>    missing to enable it.
> v2: https://lore.kernel.org/r/0-v2-16665a652079+5947-smmuv3_newapi_p2_jgg@nvidia.com
>  - Rebased on iommmufd + Joerg's tree
>  - Use sid_smmu_domain consistently to refer to the domain attached to the
>    device (eg the PCIe RID)
>  - Rework how arm_smmu_attach_*() and callers flow to be more careful
>    about ordering around ATC invalidation. The ATC must be invalidated
>    after it is impossible to establish stale entires.
>  - ATS disable is now entirely part of arm_smmu_attach_dev_ste(), which is
>    the only STE type that ever disables ATS.
>  - Remove the 'existing_master_domain' optimization, the code is
>    functionally fine without it.
>  - Whitespace, spelling, and checkpatch related items
>  - Fixed wrong value stored in the xa for the BTM flows
>  - Use pasid more consistently instead of id
> v1: https://lore.kernel.org/r/0-v1-afbb86647bbd+5-smmuv3_newapi_p2_jgg@nvidia.com
> 
> Jason Gunthorpe (27):
>   iommu/arm-smmu-v3: Do not allow a SVA domain to be set on the wrong
>     PASID
>   iommu/arm-smmu-v3: Do not ATC invalidate the entire domain
>   iommu/arm-smmu-v3: Add a type for the CD entry
>   iommu/arm-smmu-v3: Add an ops indirection to the STE code
>   iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
>   iommu/arm-smmu-v3: Consolidate clearing a CD table entry
>   iommu/arm-smmu-v3: Move the CD generation for S1 domains into a
>     function
>   iommu/arm-smmu-v3: Move allocation of the cdtable into
>     arm_smmu_get_cd_ptr()
>   iommu/arm-smmu-v3: Allocate the CD table entry in advance
>   iommu/arm-smmu-v3: Move the CD generation for SVA into a function
>   iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
>   iommu/arm-smmu-v3: Start building a generic PASID layer
>   iommu/arm-smmu-v3: Make smmu_domain->devices into an allocated list
>   iommu/arm-smmu-v3: Make changing domains be hitless for ATS
>   iommu/arm-smmu-v3: Add ssid to struct arm_smmu_master_domain
>   iommu/arm-smmu-v3: Keep track of valid CD entries in the cd_table
>   iommu/arm-smmu-v3: Thread SSID through the arm_smmu_attach_*()
>     interface
>   iommu/arm-smmu-v3: Make SVA allocate a normal arm_smmu_domain
>   iommu/arm-smmu-v3: Keep track of arm_smmu_master_domain for SVA
>   iommu: Add ops->domain_alloc_sva()
>   iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domain
>   iommu/arm-smmu-v3: Consolidate freeing the ASID/VMID
>   iommu/arm-smmu-v3: Move the arm_smmu_asid_xa to per-smmu like vmid
>   iommu/arm-smmu-v3: Bring back SVA BTM support
>   iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is
>     used
>   iommu/arm-smmu-v3: Allow a PASID to be set when RID is
>     IDENTITY/BLOCKED
>   iommu/arm-smmu-v3: Allow setting a S1 domain to a PASID
> 
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  639 +++++-----
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 1036 +++++++++++------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   79 +-
>  drivers/iommu/iommu-sva.c                     |    4 +-
>  drivers/iommu/iommu.c                         |   12 +-
>  include/linux/iommu.h                         |    3 +
>  6 files changed, 1024 insertions(+), 749 deletions(-)
> 
> 
> base-commit: 98b23ebb0c84657a135957d727eedebd1280cbbf
> -- 
> 2.43.2
> 



More information about the linux-arm-kernel mailing list