[RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx

Mon Jun 24 10:07:47 PDT 2024

On Mon, Jun 24, 2024 at 09:53:00AM -0700, Sean Christopherson wrote:
> > Associating the KVM with the entire iommufd is a big hammer, is this
> > what we want to do?
> > 
> > I know it has to be linked to domain allocation and the coming
> > "viommu" object, and it is already linked to VFIO.
> > 
> > It means we support one KVM per iommufd (which doesn't seem
> > unreasonable, but also the first time we've had such a limitation)
> 
> And if KVM+iommufd come as pairs, wouldn't iommufd_ctx_set_kvm() need to reject
> attempts to bind devices associated with different KVMs, as opposed to silently
> ignoring that case?  E.g. something like the below?  Or is there magic elsewhere
> in the stack that prevents such a scenario?

Yes, it would need things like that

But I think based on other discussions we are likely to tie to the KVM
to the coming IOMMUFD VIOMMU object, and the KVM will probably be
provided at object creation time to avoid this issue.

> > Sean would you be OK with this approach considering your other series
> > to try to make more of this private?
> 
> Sorry, I completely missed this.
> 
> If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM, then I don't
> have any immediate concerns, as KVM ARM is a long, long way from being able to
> isolate KVM from the core kernel.  

I think that is a reasonable thing, I also don't really see VMID as
being general. We will have to figure out how to ensure that the KVM
FD we got is an ARM KVM FD..

> That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs out
> of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing an
> iommufd operation because of a (potentially transient) KVM resource issue is
> rather unpleasant.

It is kind of subtle, but the only thing that will consume VMIDs is
IOMMUFD operations that are working with nested translation but not
providing KVMs. This is a pretty small blast radius - ie a specific
qemu will fail to start - that I think we can tolerate it.

More normal iommu operation will not require VMIDs so things like
driver attaching/etc is fine.

> And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
> suprise me if someone wanted to add cgroup integration, e.g. similar to the
> misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
> is analagous to an ARM VMID).

Yeah, but if someone is using such a cgroup then I expect they will
also have an up to date VMM that doesn't trigger this VMID allocation
in the first place...

> Rather than on-demand pinning, would it make sense to have KVM provide an ioctl()
> (or capability, or VM type) to let userspace pin a VM's VMID?  That would allow
> for a much saner failure mode, and I suspect would be cleaner in general for iommufd.

The point of this mechanism is to support using this iommufd feature
without a KVM at all. We could instead prevent this directly 100% of
the time, but it means that HW with this BTM capability would not run
the legacy VMMs at all, so I'm not that keen on it..

When a KVM is present then the iommu needs to adopt the VMID of KVM,
and that should have a mechanism to ensure the VMID is valid so long
as the IOMMU is using it (eg because the KVM FD is open)

Jason