[PATCH v2 0/1] KVM: arm64: Map GPU memory with no struct pages

Tue Nov 26 09:10:24 PST 2024

My email client says this patch: [PATCH v2 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
   is part of a thread for this titled patchPATCH.  Is it?

The description has similarities to above description, but some adds, some drops.

So, could you clean these two up into (a) a series, or (b) single, separate PATCH's?

Thanks.

- Don

On 11/18/24 8:19 AM, ankita at nvidia.com wrote:
> From: Ankit Agrawal <ankita at nvidia.com>
> 
> Grace based platforms such as Grace Hopper/Blackwell Superchips have
> CPU accessible cache coherent GPU memory. The current KVM code
> prevents such memory to be mapped Normal cacheable and the patch aims
> to solve this use case.
> 
> Today KVM forces the memory to either NORMAL or DEVICE_nGnRE
> based on pfn_is_map_memory() and ignores the per-VMA flags that
> indicates the memory attributes. This means there is no way for
> a VM to get cachable IO memory (like from a CXL or pre-CXL device).
> In both cases the memory will be forced to be DEVICE_nGnRE and the
> VM's memory attributes will be ignored.
> 
> The pfn_is_map_memory() is thus restrictive and allows only for
> the memory that is added to the kernel to be marked as cacheable.
> In most cases the code needs to know if there is a struct page, or
> if the memory is in the kernel map and pfn_valid() is an appropriate
> API for this. Extend the umbrella with pfn_valid() to include memory
> with no struct pages for consideration to be mapped cacheable in
> stage 2. A !pfn_valid() implies that the memory is unsafe to be mapped
> as cacheable.
> 
> Also take care of the following two cases that are unsafe to be mapped
> as cacheable:
> 1. The VMA pgprot may have VM_IO set alongwith MT_NORMAL or MT_NORMAL_TAGGED.
>     Although unexpected and wrong, presence of such configuration cannot
>     be ruled out.
> 2. Configurations where VM_MTE_ALLOWED is not set and KVM_CAP_ARM_MTE
>     is enabled. Otherwise a malicious guest can enable MTE at stage 1
>     without the hypervisor being able to tell. This could cause external
>     aborts.
> 
> The GPU memory such as on the Grace Hopper systems is interchangeable
> with DDR memory and retains its properties. Executable faults should thus
> be allowed on the memory determined as Normal cacheable.
> 
> Note when FWB is not enabled, the kernel expects to trivially do
> cache management by flushing the memory by linearly converting a
> kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). This is
> only possibile for struct page backed memory. Do not allow non-struct
> page memory to be cachable without FWB.
> 
> The changes are heavily influenced by the insightful discussions between
> Catalin Marinas and Jason Gunthorpe [1] on v1. Many thanks for their
> valuable suggestions.
> 
> Applied over next-20241117 and tested on the Grace Hopper and
> Grace Blackwell platforms by booting up VM and running several CUDA
> workloads. This has not been tested on MTE enabled hardware. If
> someone can give it a try, it will be very helpful.
> 
> v1 -> v2
> 1. Removed kvm_is_device_pfn() as a determiner for device type memory
>     determination. Instead using pfn_valid()
> 2. Added handling for MTE.
> 3. Minor cleanup.
> 
> Link: https://lore.kernel.org/lkml/20230907181459.18145-2-ankita@nvidia.com [1]
> 
> Ankit Agrawal (1):
>    KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
> 
>   arch/arm64/include/asm/kvm_pgtable.h |   8 +++
>   arch/arm64/kvm/hyp/pgtable.c         |   2 +-
>   arch/arm64/kvm/mmu.c                 | 101 +++++++++++++++++++++------
>   3 files changed, 87 insertions(+), 24 deletions(-)
>