[PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl

Tue May 20 01:38:39 PDT 2025

Hi Nicolin,

On 5/19/2025 11:44 PM, Nicolin Chen wrote:
> On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
>> Jason, Nicolin, Kevin,
>>
>>
>> On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
>>> On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
>>>> +/**
>>>> + * struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
>>>> + * @size: sizeof(struct iommu_hw_queue_alloc)
>>>> + * @flags: Must be 0
>>>> + * @viommu_id: Virtual IOMMU ID to associate the HW queue with
>>>> + * @type: One of enum iommu_hw_queue_type
>>>> + * @index: The logical index to the HW queue per virtual IOMMU for a multi-queue
>>>> + *         model
>>>> + * @out_hw_queue_id: The ID of the new HW queue
>>>> + * @base_addr: Base address of the queue memory in guest physical address space
>>>> + * @length: Length of the queue memory in the guest physical address space
>>>> + *
>>>> + * Allocate a HW queue object for a vIOMMU-specific HW-accelerated queue, which
>>>> + * allows HW to access a guest queue memory described by @base_addr and @length.
>>>> + * Upon success, the underlying physical pages of the guest queue memory will be
>>>> + * pinned to prevent VMM from unmapping them in the IOAS until the HW queue gets
>>>> + * destroyed.
>>>
>>> Do we have way to make the pinning optional?
>>>
>>> As I understand AMD's system the iommu HW itself translates the
>>> base_addr through the S2 page table automatically, so it doesn't need
>>> pinned memory and physical addresses but just the IOVA.
>>
>> Correct. HW will translate GPA -> SPA automatically using below information.
>>
>> AMD IOMMU need special device ID to setup with  GPA -> SPA mapping per VM.
>> and its programmed in VF Control BAR (VFCntlMMIO Offset {16’b[GuestID],
>> 6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use this
>> address for GPA to SPA translation for buffers like command buffer.
>>
>> So HW will use Base address (GPA), head/tail pointer to get the offset from
>> Base. Then it will use GPA -> SPA translation.
>>
>>
>>>
>>> Perhaps for this reason the pinning should be done with a function
>>> call from the driver?
>>
>> We still need to make sure memory allocated for page is present in memory so
>> that IOMMU HW can access it.
>>
>> Pinning at the time of guest boot is enough here -OR- do we need to increase
>> reference in queue_alloc() path ?
> 
> For NVIDIA's vCMDQ that reads host PA directly, pages should be
> pinned once when stage 2 mappings are created for the guest RAM,
> and iommu_hw_queue_alloc() should pin the pages again to prevent
> the gPA from being unmapped in the stage 2 page table. Otherwise
> it will be a security hole, as HW continues to read the unmapped
> memory through physical address space.
> 
> I understand that AMD Command Buffer also needs the S2 mappings
> to be present in order to work correctly. But what happens if a
> queue memory that isn't pinned (or even gets unmapped)? Will it
> raise a translation fault v.s. HW reading the unmapped memory?

If page is unmapped then stage 2 (Host page table) gets updated. IOMMU will not
be able to find page and logs fault.

> 
> If so, I think this is Jason's point: there would be unlikely a
> security hole, i.e. for AMD, iommu_hw_queue_alloc() pinning the
> physical pages is likely optional.

I think so.

-Vasant