[PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user

Mon Mar 20 14:22:42 PDT 2023

On Mon, Mar 20, 2023 at 03:45:54PM -0300, Jason Gunthorpe wrote:
> On Mon, Mar 20, 2023 at 09:59:45AM -0700, Nicolin Chen wrote:
> > On Mon, Mar 20, 2023 at 01:04:35PM -0300, Jason Gunthorpe wrote:
> > 
> > > > > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > > > > into the driver. Kernel will convert dev_id to pSID. Driver will
> > > > > program the map into HW.
> > > > 
> > > > Can we just pass a vSID via the alloc ioctl like this?
> > > > 
> > > > -----------------------------------------------------------
> > > > @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
> > > >  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
> > > >         __u64 flags;
> > > >         __u32 s2vmid;
> > > > -       __u32 __reserved;
> > > > +       __u32 sid;
> > > >         __u64 s1ctxptr;
> > > >         __u64 s1cdmax;
> > > >         __u64 s1fmt;
> > > > -----------------------------------------------------------
> > > > 
> > > > An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> > > > an SID filed anyway.
> > > 
> > > No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
> > > model.
> > > 
> > > dev_id is the SID.
> > > 
> > > The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
> > > 
> > > The kernel has to translate the vSID to the dev_id to the pSID to
> > > issue an ATC invalidation for the correct entity.
> > 
> > OK. This narrative makes sense. I think our solution (the entire
> > stack) here mixes these two terms between HWPT/ASID and STE/SID.
> 
> HWPT is an "ASID/DID" on Intel and a CD table on SMMUv3
> 
> > What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
> > the host an HWPT alloc ioctl. The former one is based on an SID
> > or a device, while the latter one is based on ASID.
> > 
> > So the correct way should be for QEMU to maintain an ASID-based
> > list, corresponding to the s1ctxptr from STEs, and only send an
> > alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
> > of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
> > a dev_id (and pSID accordingly).
> 
> It is not ASID, it just s1ctxptr's - de-duplicate them.

SMMU has "ASID" too. And it's one per CD table. It can be also
seen as one per iommu_domain.

The following are lines from arm_smmu_domain_finalise_s1():
	...
	ret = xa_alloc(&arm_smmu_asid_xa, &asid, &cfg->cd,
		       XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
	...
	cfg->cd.asid    = (u16)asid;
	...

> Do something about SMMUv3 not being able to interwork iommu_domains
> across instances

I don't follow this one. Device instances?

> > In another word, an SMMU_CMD_CFGI_STE should do a mandatory SID
> > ioctl and an optional HWPT alloc ioctl (only allocates a HWPT if
> > the s1ctxptr in the STE is new).
> 
> No, there is no SID ioctl at the STE stage.
> 
> The vSID was decided by qemu before the VM booted. It created it when
> it built the vRID and the vPCI device. The vSID is tied to the vfio
> device FD.
> 
> Somehow the VM knows the relationship between vSID and vPCI/vRID. IIRC
> this is passed in through ACPI from qemu.

Yes.

> So vSID is an alais for the dev_id in iommfd language, and quemu
> always has a translation table for it.

I see.

> So CFGI_STE maps to allocating a de-duplicated HWPT for the CD table,
> and then a replace operation on the device FD represented by the vSID
> to change the pSTE to point to the HWPT.
> 
> The HWPT is effectively the "shadow STE".

IIUIC, the ioctl for the link of vSID/dev_id should happen at
the stage when boot boots, while the HWPT alloc ioctl happens
at CFGI_STE.

> > What could be a good prototype of the ioctl? Would it be a VFIO
> > device one or IOMMUFD one?
> 
> If we load the vSID table it should be a iommufd one, linked to the
> ARM SMMUv3 driver and probably take in a pointer to an array of
> vSID/dev_id pairs. Maybe an add/remove type of operation.

Will try some solution.

> > > I would expect that if invalidation can fail that we have a way to
> > > signal that failure back to the guest.
> > 
> > That's plausible to me, and it could apply to a translation
> > fault too. So, should we add back the iommufd infrastructure
> > for the fault injection (without PRI), in v2?
> 
> It would be nice if things were not so big, I don't think we need to
> tackle translation fault at this time, but we should be thinking about
> what invalidation cmd fault converts into.

Will see if we can add a compact one, or some other solution
for invalidation fault only.

Thanks
Nic