[PATCH v1 14/14] iommu/arm-smmu-v3: Add arm_smmu_cache_invalidate_user

Mon Mar 20 09:59:45 PDT 2023

On Mon, Mar 20, 2023 at 01:04:35PM -0300, Jason Gunthorpe wrote:

> > > We need an ioctl for this, I think. To load a map of vSID to dev_id
> > > into the driver. Kernel will convert dev_id to pSID. Driver will
> > > program the map into HW.
> > 
> > Can we just pass a vSID via the alloc ioctl like this?
> > 
> > -----------------------------------------------------------
> > @@ -429,7 +429,7 @@ struct iommu_hwpt_arm_smmuv3 {
> >  #define IOMMU_SMMUV3_FLAG_VMID (1 << 1) /* vmid override */
> >         __u64 flags;
> >         __u32 s2vmid;
> > -       __u32 __reserved;
> > +       __u32 sid;
> >         __u64 s1ctxptr;
> >         __u64 s1cdmax;
> >         __u64 s1fmt;
> > -----------------------------------------------------------
> > 
> > An alloc is initiated by an SMMU_CMD_CFGI_STE command that has
> > an SID filed anyway.
> 
> No, a HWPT is not a device or a SID. a HWPT is an ASID in the ARM
> model.
> 
> dev_id is the SID.
> 
> The cfgi_ste will carry the vSID which is mapped to a iommufd dev_id.
> 
> The kernel has to translate the vSID to the dev_id to the pSID to
> issue an ATC invalidation for the correct entity.

OK. This narrative makes sense. I think our solution (the entire
stack) here mixes these two terms between HWPT/ASID and STE/SID.

What QEMU does is trapping an SMMU_CMD_CFGI_STE command to send
the host an HWPT alloc ioctl. The former one is based on an SID
or a device, while the latter one is based on ASID.

So the correct way should be for QEMU to maintain an ASID-based
list, corresponding to the s1ctxptr from STEs, and only send an
alloc ioctl upon a new s1ctxptr/ASID. Meanwhile, at every trap
of SMMU_CMD_CFGI_STE, it calls a separate ioctl to tie a vSID to
a dev_id (and pSID accordingly).

In another word, an SMMU_CMD_CFGI_STE should do a mandatory SID
ioctl and an optional HWPT alloc ioctl (only allocates a HWPT if
the s1ctxptr in the STE is new).

What could be a good prototype of the ioctl? Would it be a VFIO
device one or IOMMUFD one?

> > > SW path will program the map into an xarray
> > 
> > I found a tricky thing about SIDs in the SMMU driver when doing
> > this experiment: the SMMU kernel driver mostly handles devices
> > using struct arm_smmu_master. However, an arm_smmu_master might
> > have a num_streams>1, meaning a device can have multiple SIDs.
> > Though it seems that PCI devices might not be in this scope, a
> > plain xarray might not work for other type of devices in a long
> > run, if there'd be?
> 
> You'd replicate each of the vSIDs of the extra SIDs in the xarray.

Noted it down.

> > > > cache_invalidate_user as void, like we are doing now? An fault
> > > > injection pathway to report CERROR asynchronously is what we've
> > > > been doing though -- even with Eric's previous VFIO solution.
> > > 
> > > Where is this? How does it look?
> > 
> > That's postponed with the PRI support, right? My use case does
> > not need PRI actually, but a fault injection pathway to guests.
> > This pathway should be able to take care of any CERROR (detected
> > by a host interrupt) or something funky in cache_invalidate_user
> > requests itself?
> 
> I would expect that if invalidation can fail that we have a way to
> signal that failure back to the guest.

That's plausible to me, and it could apply to a translation
fault too. So, should we add back the iommufd infrastructure
for the fault injection (without PRI), in v2?

Thanks
Nic