[PATCH v10 5/6] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
David Hildenbrand
david at redhat.com
Mon Jul 7 00:32:09 PDT 2025
On 05.07.25 09:17, ankita at nvidia.com wrote:
> From: Ankit Agrawal <ankita at nvidia.com>
>
> Today KVM forces the memory to either NORMAL or DEVICE_nGnRE
> based on pfn_is_map_memory (which tracks whether the device memory
> is in the kernel map) and ignores the per-VMA flags that indicates the
> memory attributes. The KVM code is thus restrictive and allows only for
> the memory that is added to the kernel to be marked as cacheable.
>
> The device memory such as on the Grace Hopper/Blackwell systems
> is interchangeable with DDR memory and retains properties such as
> cacheability, unaligned accesses, atomics and handling of executable
> faults. This requires the device memory to be mapped as NORMAL in
> stage-2.
>
> Given that the GPU device memory is not added to the kernel (but is rather
> VMA mapped through remap_pfn_range() in nvgrace-gpu module which sets
> VM_PFNMAP), pfn_is_map_memory() is false and thus KVM prevents such memory
> to be mapped Normal cacheable. The patch aims to solve this use case.
>
> Note when FWB is not enabled, the kernel expects to trivially do
> cache management by flushing the memory by linearly converting a
> kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). The
> cache management thus relies on memory being mapped. Moreover
> ARM64_HAS_CACHE_DIC CPU cap allows KVM to avoid flushing the icache
> and turns icache_inval_pou() into a NOP. These two capabilities
> are thus a requirement of the cacheable PFNMAP feature. Make use of
> kvm_arch_supports_cacheable_pfnmap() to check them.
>
> A cachebility check is made by consulting the VMA pgprot value.
> If the pgprot mapping type is cacheable, it is safe to be mapped S2
> cacheable as the KVM S2 will have the same Normal memory type as the
> VMA has in the S1 and KVM has no additional responsibility for safety.
> Checking pgprot as NORMAL is thus a KVM sanity check.
>
> No additional checks for MTE are needed as kvm_arch_prepare_memory_region()
> already tests it at an early stage during memslot creation. There would
> not even be a fault if the memslot is not created.
>
> CC: Oliver Upton <oliver.upton at linux.dev>
> CC: Sean Christopherson <seanjc at google.com>
> Suggested-by: Jason Gunthorpe <jgg at nvidia.com>
> Suggested-by: Catalin Marinas <catalin.marinas at arm.com>
> Suggested-by: David Hildenbrand <david at redhat.com>
> Tested-by: Donald Dutile <ddutile at redhat.com>
> Signed-off-by: Ankit Agrawal <ankita at nvidia.com>
> ---
Reviewed-by: David Hildenbrand <david at redhat.com>
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list