[PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl

Thu May 22 18:51:25 PDT 2025

> From: Vasant Hegde <vasant.hegde at amd.com>
> Sent: Tuesday, May 20, 2025 4:39 PM
> 
> Hi Nicolin,
> 
> 
> On 5/19/2025 11:44 PM, Nicolin Chen wrote:
> > On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
> >> Jason, Nicolin, Kevin,
> >>
> >>
> >> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
> >>> On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
> >>>> +/**
> >>>> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
> >>>> + * @size: sizeof(struct iommu_hw_queue_alloc)
> >>>> + * @flags: Must be 0
> >>>> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with
> >>>> + * @type: One of enum iommu_hw_queue_type
> >>>> + * @index: The logical index to the HW queue per virtual IOMMU for a
> multi-queue
> >>>> + *         model
> >>>> + * @out_hw_queue_id: The ID of the new HW queue
> >>>> + * @base_addr: Base address of the queue memory in guest physical
> address space
> >>>> + * @length: Length of the queue memory in the guest physical address
> space
> >>>> + *
> >>>> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated
> queue, which
> >>>> + * allows HW to access a guest queue memory described by
> @base_addr and @length.
> >>>> + * Upon success, the underlying physical pages of the guest queue
> memory will be
> >>>> + * pinned to prevent VMM from unmapping them in the IOAS until the
> HW queue gets
> >>>> + * destroyed.
> >>>
> >>> Do we have way to make the pinning optional?
> >>>
> >>> As I understand AMD's system the iommu HW itself translates the
> >>> base_addr through the S2 page table automatically, so it doesn't need
> >>> pinned memory and physical addresses but just the IOVA.
> >>
> >> Correct. HW will translate GPA -> SPA automatically using below
> information.
> >>
> >> AMD IOMMU need special device ID to setup with  GPA -> SPA mapping
> per VM.
> >> and its programmed in VF Control BAR (VFCntlMMIO Offset
> {16’b[GuestID],
> >> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use
> this
> >> address for GPA to SPA translation for buffers like command buffer.
> >>
> >> So HW will use Base address (GPA), head/tail pointer to get the offset
> from
> >> Base. Then it will use GPA -> SPA translation.
> >>
> >>
> >>>
> >>> Perhaps for this reason the pinning should be done with a function
> >>> call from the driver?
> >>
> >> We still need to make sure memory allocated for page is present in
> memory so
> >> that IOMMU HW can access it.
> >>
> >> Pinning at the time of guest boot is enough here -OR- do we need to
> increase
> >> reference in queue_alloc() path ?
> >
> > For NVIDIA's vCMDQ that reads host PA directly, pages should be
> > pinned once when stage 2 mappings are created for the guest RAM,
> > and iommu_hw_queue_alloc() should pin the pages again to prevent
> > the gPA from being unmapped in the stage 2 page table. Otherwise
> > it will be a security hole, as HW continues to read the unmapped
> > memory through physical address space.
> >
> > I understand that AMD Command Buffer also needs the S2 mappings
> > to be present in order to work correctly. But what happens if a
> > queue memory that isn't pinned (or even gets unmapped)? Will it
> > raise a translation fault v.s. HW reading the unmapped memory?
> 
> If page is unmapped then stage 2 (Host page table) gets updated. IOMMU
> will not
> be able to find page and logs fault.
> 

As long as the fault is contained only for the relevant queue, yes
we don't need another pinning from the driver.