[PATCH v3 0/6] Expose GPU memory as coherently CPU accessible

Wed Apr 12 05:28:08 PDT 2023

On Wed, 05 Apr 2023 19:01:28 +0100,
<ankita at nvidia.com> wrote:
> 
> From: Ankit Agrawal <ankita at nvidia.com>
> 
> NVIDIA's upcoming Grace Hopper Superchip provides a PCI-like device
> for the on-chip GPU that is the logical OS representation of the
> internal propritary cache coherent interconnect.
> 
> This representation has a number of limitations compared to a real PCI
> device, in particular, it does not model the coherent GPU memory
> aperture as a PCI config space BAR, and PCI doesn't know anything
> about cacheable memory types.
> 
> Provide a VFIO PCI variant driver that adapts the unique PCI
> representation into a more standard PCI representation facing
> userspace. The GPU memory aperture is obtained from ACPI, according to
> the FW specification, and exported to userspace as the VFIO_REGION
> that covers the first PCI BAR. qemu will naturally generate a PCI
> device in the VM where the cacheable aperture is reported in BAR1.
> 
> Since this memory region is actually cache coherent with the CPU, the
> VFIO variant driver will mmap it into VMA using a cacheable mapping.
> 
> As this is the first time an ARM environment has placed cacheable
> non-struct page backed memory (eg from remap_pfn_range) into a KVM
> page table, fix a bug in ARM KVM where it does not copy the cacheable
> memory attributes from non-struct page backed PTEs to ensure the guest
> also gets a cacheable mapping.

This is not a bug, but a conscious design decision. As you pointed out
above, nothing needed this until now, and a device mapping is the only
safe thing to do as we know exactly *nothing* about the memory that
gets mapped.

	M.

-- 
Without deviation from the norm, progress is not possible.