[PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl

Mon May 19 11:14:22 PDT 2025

On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
> Jason, Nicolin, Kevin,
> 
> 
> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
> > On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
> >> +/**
> >> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
> >> + * @size: sizeof(struct iommu_hw_queue_alloc)
> >> + * @flags: Must be 0
> >> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with
> >> + * @type: One of enum iommu_hw_queue_type
> >> + * @index: The logical index to the HW queue per virtual IOMMU for a multi-queue
> >> + *         model
> >> + * @out_hw_queue_id: The ID of the new HW queue
> >> + * @base_addr: Base address of the queue memory in guest physical address space
> >> + * @length: Length of the queue memory in the guest physical address space
> >> + *
> >> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, which
> >> + * allows HW to access a guest queue memory described by @base_addr and @length.
> >> + * Upon success, the underlying physical pages of the guest queue memory will be
> >> + * pinned to prevent VMM from unmapping them in the IOAS until the HW queue gets
> >> + * destroyed.
> > 
> > Do we have way to make the pinning optional?
> > 
> > As I understand AMD's system the iommu HW itself translates the
> > base_addr through the S2 page table automatically, so it doesn't need
> > pinned memory and physical addresses but just the IOVA.
> 
> Correct. HW will translate GPA -> SPA automatically using below information.
> 
> AMD IOMMU need special device ID to setup with  GPA -> SPA mapping per VM.
> and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID],
> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this
> address for GPA to SPA translation for buffers like command buffer.
> 
> So HW will use Base address (GPA), head/tail pointer to get the offset from
> Base. Then it will use GPA -> SPA translation.
> 
> 
> > 
> > Perhaps for this reason the pinning should be done with a function
> > call from the driver?
> 
> We still need to make sure memory allocated for page is present in memory so
> that IOMMU HW can access it.
> 
> Pinning at the time of guest boot is enough here -OR- do we need to increase
> reference in queue_alloc() path ?

For NVIDIA's vCMDQ that reads host PA directly, pages should be
pinned once when stage 2 mappings are created for the guest RAM,
and iommu_hw_queue_alloc() should pin the pages again to prevent
the gPA from being unmapped in the stage 2 page table. Otherwise
it will be a security hole, as HW continues to read the unmapped
memory through physical address space.

I understand that AMD Command Buffer also needs the S2 mappings
to be present in order to work correctly. But what happens if a
queue memory that isn't pinned (or even gets unmapped)? Will it
raise a translation fault v.s. HW reading the unmapped memory?

If so, I think this is Jason's point: there would be unlikely a
security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the
physical pages is likely optional.

Thanks
Nicolin