[PATCH RFCv1 05/14] iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC

Tue May 21 17:13:50 PDT 2024

On Tue, May 21, 2024 at 03:05:55PM -0300, Jason Gunthorpe wrote:
> On Tue, May 14, 2024 at 06:20:06PM -0700, Nicolin Chen wrote:
> > > I suspect 0 should be reserved as a non-set value for some
> > > basic sanity in all these driver type enums.
> > 
> > We have an IOMMU_HWPT_DATA_NONE for HWPT_ALLOC to compatible
> > with an S2 hwpt, since it doesn't need a data.
> > 
> > Maybe we can have an IOMMU_VIOMMU_TYPE_DEFAULT to be 0, for
> > an IOMMU driver (e.g. VT-d) that doesn't need to handle nor
> > be aware of any viommu object?
> 
> Seems like a good practice, and perhaps userspace will find value in a
> generic viommu object that is always present.

Yea. VMM is always allowed to create a viommu to wrap an S2
HWPT. Then, I assume iommufd in this case should allocate a
viommu object if !domain_ops->viommu_alloc.

> > That makes a lot sense. I'd need to go through QEMU code and
> > see how we will accommodate these two more naturally: likely
> > the QEMU core should allocate an S2 HWPT for a VM, while the
> > viommu code should allocate a VIOMMU for each instance.
> 
> I'd suggest that core qemu should allocate the S2 IOAS and pass that
> to the qemu viommu driver
>
> The qemu viommu driver should create the hwpt and then the viommu and
> perhaps return the viommu or hwpt back to the core code.
>
> If the vSTE flow above is used for identity then the qemu viommu
> driver would also have to go an create vSTEs for identity and attach
> them to all devices before the VM starts up. Then when the OS
> activates the SMMU it would have to mirror the real vSTE from guest
> memory to the kernel.

The entire flow makes sense to me.

> Not sure there is value in having the core qemu code directly access
> the hwpt/viommu?

I think so, though here might be some complication here.

On one side, it may not be straightforward for a qemu viommu
driver to hold a shared S2 hwpt, as the driver is typically
per instance, though I think it can keep viommu to its own.
So passing the S2 hwpt back to qemu core and tie to iommufd
handler (ictx) makes sense.

On the other side, there can be some future HW potentially
supporting two+ kinds of IO page tables so a VM may have two+
S2 hwpts? Then the core would hold a list of S2 hwpts and the
viommu driver would need to try-n-allocate viommu against the
list..

Thanks
Nicolin