[PATCH v9 5/6] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags
Jason Gunthorpe
jgg at nvidia.com
Fri Jul 4 07:04:31 PDT 2025
On Sat, Jun 21, 2025 at 04:21:10AM +0000, ankita at nvidia.com wrote:
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1681,18 +1681,53 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (is_error_noslot_pfn(pfn))
> return -EFAULT;
>
> + /*
> + * Check if this is non-struct page memory PFN, and cannot support
> + * CMOs. It could potentially be unsafe to access as cachable.
> + */
> if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(pfn)) {
> /*
> - * If the page was identified as device early by looking at
> - * the VMA flags, vma_pagesize is already representing the
> - * largest quantity we can map. If instead it was mapped
> - * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
> - * and must not be upgraded.
> - *
> - * In both cases, we don't let transparent_hugepage_adjust()
> - * change things at the last minute.
> + * COW VM_PFNMAP is possible when doing a MAP_PRIVATE
> + * /dev/mem mapping on systems that allow such mapping.
> + * Reject such case.
> */
> - s2_force_noncacheable = true;
> + if (is_cow_mapping(vm_flags))
> + return -EINVAL;
I still would like an explanation why we need to block this.
COW PFNMAP is like MIXEDMAP, you end up with a VMA where there is a
mixture of MMIO and normal pages. Arguably you are supposed to use
vm_normal_page() not pfn_is_map_memory(), but that seems difficult for
KVM.
Given we exclude the cachable case with the pfn_is_map_memory() we
know this is the non-struct page memory already, so why do we need to
block the COW?
I think the basic rule we are going for is that within the VMA the
non-normal/special PTE have to follow the vma->vm_pgprot while the
normal pages have to be cachable.
So if we find a normal page (ie pfn_is_map_memory()) then we know it
is cachable and s2_force_noncacheable = false. Otherwise we use the
vm_pgprot to decide if the special PTE is cachable.
David can you think of any reason to have this is_cow_mapping() test?
> + if (is_vma_cacheable) {
> + /*
> + * Whilst the VMA owner expects cacheable mapping to this
> + * PFN, hardware also has to support the FWB and CACHE DIC
> + * features.
> + *
> + * ARM64 KVM relies on kernel VA mapping to the PFN to
> + * perform cache maintenance as the CMO instructions work on
> + * virtual addresses. VM_PFNMAP region are not necessarily
> + * mapped to a KVA and hence the presence of hardware features
> + * S2FWB and CACHE DIC are mandatory for cache maintenance.
> + *
> + * Check if the hardware supports it before allowing the VMA
> + * owner request for cacheable mapping.
> + */
> + if (!kvm_arch_supports_cacheable_pfnmap())
> + return -EFAULT;
> +
> + /* Cannot degrade cachable to non cachable */
> + if (s2_force_noncacheable)
> + return -EINVAL;
What am I missing? After the whole series is applied this is the first
reference to s2_force_noncacheable after it is initialized to
false. So this can't happen?
> + } else {
> + /*
> + * If the page was identified as device early by looking at
> + * the VMA flags, vma_pagesize is already representing the
> + * largest quantity we can map. If instead it was mapped
> + * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
> + * and must not be upgraded.
> + *
> + * In both cases, we don't let transparent_hugepage_adjust()
> + * change things at the last minute.
> + */
> + s2_force_noncacheable = true;
> + }
Then this logic that immediately follows:
if (is_vma_cacheable && s2_force_noncacheable)
return -EINVAL;
Doesn't make alot of sense either, the only cases that set
s2_force_noncacheable=true are the else block of 'if (is_vma_cacheable)'
so this is dead code too.
Seems like this still needs some cleanup to remove these impossible
conditions. The logic make sense to me otherwise though.
Jason
More information about the linux-arm-kernel
mailing list