[RFC PATCH v2 4/7] iommufd: Associate kvm pointer to iommufd ctx

Mon Jun 24 09:53:00 PDT 2024

On Thu, Feb 08, 2024, Jason Gunthorpe wrote:
> On Thu, Feb 08, 2024 at 03:18:34PM +0000, Shameer Kolothum wrote:
> 
> > diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> > index 991f864d1f9b..28ede82bb1a6 100644
> > --- a/drivers/iommu/iommufd/iommufd_private.h
> > +++ b/drivers/iommu/iommufd/iommufd_private.h
> > @@ -16,6 +16,7 @@ struct iommu_domain;
> >  struct iommu_group;
> >  struct iommu_option;
> >  struct iommufd_device;
> > +struct kvm;
> >  
> >  struct iommufd_ctx {
> >  	struct file *file;
> > @@ -27,6 +28,8 @@ struct iommufd_ctx {
> >  	/* Compatibility with VFIO no iommu */
> >  	u8 no_iommu_mode;
> >  	struct iommufd_ioas *vfio_ioas;
> > +	/* Associated KVM pointer */
> > +	struct kvm *kvm;
> >  };
> 
> Associating the KVM with the entire iommufd is a big hammer, is this
> what we want to do?
> 
> I know it has to be linked to domain allocation and the coming
> "viommu" object, and it is already linked to VFIO.
> 
> It means we support one KVM per iommufd (which doesn't seem
> unreasonable, but also the first time we've had such a limitation)

And if KVM+iommufd come as pairs, wouldn't iommufd_ctx_set_kvm() need to reject
attempts to bind devices associated with different KVMs, as opposed to silently
ignoring that case?  E.g. something like the below?  Or is there magic elsewhere
in the stack that prevents such a scenario?

void iommufd_ctx_set_kvm(struct iommufd_ctx *ictx, struct kvm *kvm)
{
	int r = 0;

	xa_lock(&ictx->objects);
	if (!ictx->kvm)
		ictx->kvm = kvm;
	else if (icxt->kvm != kvm)
		r = -EINVAL;
	xa_unlock(&ictx->objects);
}

> The other option would be to pass in the kvm to the individual sub
> objects.
> 
> Kevin?
> 
> Sean would you be OK with this approach considering your other series
> to try to make more of this private?

Sorry, I completely missed this.

If kvm_pinned_vmid_{get,put}() are implemented directly by KVM ARM, then I don't
have any immediate concerns, as KVM ARM is a long, long way from being able to
isolate KVM from the core kernel.  And if we ever want/need to provide generic
APIs for propagating state from KVM into iommufd, bundling the KVM file pointer[*]
with a set of function pointers wouldn't be terribly difficult.

That said, I find the on-demand pinning to be very odd.  IIUC, if KVM runs out
of pinnable VMIDs, attaching a device to the KVM+iommu will fail.  Failing an
iommufd operation because of a (potentially transient) KVM resource issue is
rather unpleasant.

And assuming that pinnable VMIDs are a somewhat scarce resource, it wouldn't
suprise me if someone wanted to add cgroup integration, e.g. similar to the
misc cgroup that's used to manage SEV(-ES) ASIDs on KVM AMD (IIUC, an SEV ASID
is analagous to an ARM VMID).

Rather than on-demand pinning, would it make sense to have KVM provide an ioctl()
(or capability, or VM type) to let userspace pin a VM's VMID?  That would allow
for a much saner failure mode, and I suspect would be cleaner in general for iommufd.

[*] https://lore.kernel.org/all/ZXkVSKULLivrMkBl@google.com