[PATCH RFCv1 08/14] iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl

Mon Jun 10 15:01:10 PDT 2024

On Mon, Jun 10, 2024 at 01:01:32PM -0700, Nicolin Chen wrote:
> On Mon, Jun 10, 2024 at 09:04:46AM -0300, Jason Gunthorpe wrote:
> > On Fri, Jun 07, 2024 at 02:19:21PM -0700, Nicolin Chen wrote:
> > 
> > > > IOTLB efficiency will suffer though when splitting 1p -> 2v while
> > > > invalidation performance will suffer when joining 2p -> 1v.
> > > 
> > > I think the invalidation efficiency is actually solvable. So,
> > > basically viommu_invalidate would receive a whole batch of cmds
> > > and dispatch them to different pSMMUs (nested_domains/devices).
> > > We already have a vdev_id table for devices, yet we just need a
> > > new vasid table for nested_domains. Right?
> > 
> > You can't know the ASID usage of the hypervisor from the VM, unless
> > you also inspect the CD table memory in the guest. That seems like
> > something we should try hard to avoid.
> 
> Actually, even now as we put a dispatcher in VMM, VMM still does
> decode the CD table to link ASID to s1_hwpt. Otherwise, it could
> only broadcast a TLBI cmd to all pSMMUs.

No, there should be no CD table decoding and no linking ASID to
anything by the VMM.

The ARM architecture is clean, the ASID can remain private to the VM,
there is no reason for the VMM to understand it.

The s1_hwpt is derived only from the vSTE and nothing more. It would
be fine for all the devices to have their own s1_hwpts with their own
vSTE's inside it.

> > > With that being said, it would make the kernel design a bit more
> > > complicated. And the VMM still has to separate the commands for
> > > passthrough devices (HW iotlb) from commands for emulated devices
> > > (emulated iotlb), unless we further split the topology at the VM
> > > level to have a dedicated vSMMU for all passthrough devices --
> > > then VMM could just forward its entire cmdq to the kernel without
> > > deciphering every command (likely?).
> > 
> > I would not include the emulated devices in a shared SMMU.. For the
> > same reason, we should try hard to avoid inspecting the page table
> > memory.
> 
> I wouldn't like the idea of attaching emulated devices to a shared
> vSMMU. Yet, mind elaborating why this would inspect the page table
> memory? Or do you mean we should avoid VMM inspecting user tables?

Emulated devices can't use the HW page table walker in the SMMU since
they won't get a clean CD linkage they can use. They have to manually
walk the page tables and convert them into an IOAS. It is a big PITA,
best to be avoided.

> > If a viommu is needed for emulated then virtio-iommu may be more
> > appropriate..
> > 
> > That said I'm sure someone will want to do this, so as long as it is
> > possible in the VMM, as slow as it may be, then it is fine.
> 
> Eric hasn't replied my previous query regarding how to design this,
> yet I guess the same. And looks like Intel is doing so for emulated
> devices, since there is only one intel_iommu instance in a VM.

Yes, Intel has long has this code to build VFIO containers from page
table walks.

Jason