[PATCH RFCv1 04/14] iommufd: Add struct iommufd_viommu and iommufd_viommu_ops

Sun May 12 07:03:53 PDT 2024

On Fri, Apr 12, 2024 at 08:47:01PM -0700, Nicolin Chen wrote:
> Add a new iommufd_viommu core structure to represent a vIOMMU instance in
> the user space, typically backed by a HW-accelerated feature of an IOMMU,
> e.g. NVIDIA CMDQ-Virtualization (an ARM SMMUv3 extension) and AMD Hardware
> Accelerated Virtualized IOMMU (vIOMMU).

I expect this will also be the only way to pass in an associated KVM,
userspace would supply the kvm when creating the viommu.

The tricky bit of this flow is how to manage the S2. It is necessary
that the S2 be linked to the viommu:

 1) ARM BTM requires the VMID to be shared with KVM
 2) AMD and others need the S2 translation because some of the HW
    acceleration is done inside the guest address space

I haven't looked closely at AMD but presumably the VIOMMU create will
have to install the S2 into a DID or something?

So we need the S2 to exist before the VIOMMU is created, but the
drivers are going to need some more fixing before that will fully
work.

Does the nesting domain create need the viommu as well (in place of
the S2 hwpt)? That feels sort of natural.

There is still a lot of fixing before everything can work fully, but
do we need to make some preperations here in the uapi? Like starting
to thread the S2 through it as I described?

Kevin, does Intel forsee any viommu needs on current/future Intel HW?
I assume you are thinking about invalidation queue bypass like
everyone else. I think it is an essential feature for vSVA.

> A driver should embed this core structure in its driver viommu structure
> and call the new iommufd_viommu_alloc() helper to allocate a core/driver
> structure bundle and fill its core viommu->ops:
>     struct my_driver_viommu {
>         struct iommufd_viommu core;
> 	....
>     };
> 
>     static const struct iommufd_viommu_ops my_driver_viommu_ops = {
>         .free = my_driver_viommu_free,
>     };
> 
>     struct my_driver_viommu *my_viommu =
>             iommufd_viommu_alloc(my_driver_viommu, core);

Why don't we have an ictx here anyhow? The caller has it? Just pass it
down and then it is normal:

my_viommu = iommufd_object_alloc_elm(ictx, my_viommu, IOMMUFD_OBJ_HWPT_VIOMMU, core.obj);

And abort works properly for error handling.

Jason