[PATCH RFCv1 05/14] iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC

Sun May 12 07:27:45 PDT 2024

On Fri, Apr 12, 2024 at 08:47:02PM -0700, Nicolin Chen wrote:

> +int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_viommu_alloc *cmd = ucmd->cmd;
> +	struct iommufd_hwpt_paging *hwpt_paging;
> +	struct iommu_device *iommu_dev;
> +	struct iommufd_viommu *viommu;
> +	struct iommufd_device *idev;
> +	int rc;
> +
> +	if (cmd->flags)
> +		return -EOPNOTSUPP;
> +
> +	idev = iommufd_get_device(ucmd, cmd->dev_id);
> +	if (IS_ERR(idev))
> +		return PTR_ERR(idev);
> +	iommu_dev = idev->dev->iommu->iommu_dev;
> +
> +	if (!iommu_dev->ops->viommu_alloc) {
> +		rc = -EOPNOTSUPP;
> +		goto out_put_idev;
> +	}
> +
> +	hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id);
> +	if (IS_ERR(hwpt_paging)) {
> +		rc = PTR_ERR(hwpt_paging);
> +		goto out_put_idev;
> +	}
> +
> +	if (!hwpt_paging->nest_parent) {
> +		rc = -EINVAL;
> +		goto out_put_hwpt;
> +	}
> +
> +	viommu = iommu_dev->ops->viommu_alloc(idev->dev, cmd->type,
> +					      hwpt_paging->common.domain);
> +	if (IS_ERR(viommu)) {
> +		rc = PTR_ERR(viommu);
> +		goto out_put_hwpt;
> +	}

Ah you did already include the S2, So should it be
domain->viommu_alloc() then?

> +
> +	/* iommufd_object_finalize will store the viommu->obj.id */
> +	rc = xa_alloc(&ucmd->ictx->objects, &viommu->obj.id, XA_ZERO_ENTRY,
> +		      xa_limit_31b, GFP_KERNEL_ACCOUNT);
> +	if (rc)
> +		goto out_free;
> +
> +	viommu->obj.type = IOMMUFD_OBJ_VIOMMU;

See my other notes, lets try not to open code this.

> +	viommu->type = cmd->type;
> +
> +	viommu->ictx = ucmd->ictx;
> +	viommu->hwpt = hwpt_paging;
> +	viommu->iommu_dev = idev->dev->iommu->iommu_dev;
> +	cmd->out_viommu_id = viommu->obj.id;
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +	if (rc)
> +		goto out_erase_xa;
> +	iommufd_object_finalize(ucmd->ictx, &viommu->obj);
> +	refcount_inc(&viommu->hwpt->common.obj.users);
> +	goto out_put_hwpt;
> +
> +out_erase_xa:
> +	xa_erase(&ucmd->ictx->objects, viommu->obj.id);
> +out_free:
> +	if (viommu->ops && viommu->ops->free)
> +		viommu->ops->free(viommu);
> +	kfree(viommu);

This really should use the abort flow. The driver free callback has to
be in the object release..

> +
> +/**
> + * enum iommu_viommu_type - VIOMMU Type
> + * @IOMMU_VIOMMU_TEGRA241_CMDQV: NVIDIA Tegra241 CMDQV Extension for SMMUv3
> + */
> +enum iommu_viommu_type {
> +	IOMMU_VIOMMU_TYPE_TEGRA241_CMDQV,
> +};

At least the 241 line should be in a following patch

> +/**
> + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC)
> + * @size: sizeof(struct iommu_viommu_alloc)
> + * @flags: Must be 0
> + * @type: Type of the VIOMMU object. Must be defined in enum iommu_viommu_type
> + * @dev_id: The device to allocate this virtual IOMMU for
> + * @hwpt_id: ID of a nested parent HWPT
> + * @out_viommu_id: Output virtual IOMMU ID for the allocated object
> + *
> + * Allocate an virtual IOMMU object that holds a (shared) nested parent HWPT
> + */
> +struct iommu_viommu_alloc {
> +	__u32 size;
> +	__u32 flags;
> +	__u32 type;
> +	__u32 dev_id;
> +	__u32 hwpt_id;
> +	__u32 out_viommu_id;
> +};

This seems fine.

Let's have a following patch to change the hwpt_alloc to accept the
viommu as a hwpt as a uAPI change as well. 

The more I think about how this needs to work the more sure I am that
we need to do that.

ARM will need a fairly tricky set of things to manage the VMID
lifecycle. In BTM mode the VMID must come from the KVM. For vcmdq the
VMID is needed to create the queue/viommu. For AMD the S2 is needed to
create the VIOMMU in the first place.

So, to make this all work perfectly we need approx the following
 - S2 sharing across instances in ARM - meaning the VMID is allocated
   at attach not domain alloc
 - S2 hwpt is refcounted by the VIOMMU in the iommufd layer
 - VIOMMU is refcounted by every nesting child in the iommufd layer
 - The nesting child holds a pointer to both the S2 and the VIOMMU
   (viommu optional)
 - When the nesting child attaches to a device the STE will source the
   VMID from the VIOMMU if present otherwise from the S2
 - "RID" attach (ie naked S2) will have to be done with a Nesting
   Child using a vSTE that indicates Identity. Then the attach logic
   will have enough information to get the VMID from the VIOMMU
 - In full VIOMMU mode the S2 will never get a VMID of its own, it
   will always use the VIOMMU. Life cycle is simple, the VMID is freed
   when the VIOMMU is freed. That can't happen until all Nesting
   Children are freed. That can't happen until all Nesting Children
   are detached from devices. Detatching removes the HW touch of the VMID.

At this point you don't need the full generality, but let's please get
ready and get the viommu pointer available in all the right spots and
we can keep the current logic to borrow the VMID from the S2 for the
VIOMMU.

AMD folks, please consider if this works for you as well.

Jason