[PATCH] KVM: arm64: Skip CMOs when updating a PTE pointing to non-memory

Tue Apr 27 15:52:46 BST 2021

Hi,

I've been trying to reproduce the panic, but I haven't had any success.

With a known working PCI passtrough device, this is how I changed kvmtool:

diff --git a/vfio/core.c b/vfio/core.c
index 3ff2c0b..b4ee7e9 100644
--- a/vfio/core.c
+++ b/vfio/core.c
@@ -261,6 +261,9 @@ int vfio_map_region(struct kvm *kvm, struct vfio_device *vdev,
                return ret;
        }
 
+       char c = *(char *)base;
+       fprintf(stderr, "c = %c\n", c);
+
        return 0;
 }
 
What the change is doing is reading from the BAR region after it's has been
mmap'ed into userspace. I can see that the read hits vfio_pci_mmap_fault(), which
calls io_remap_pfn_range(), but I can't figure out how I can trigger the MMU
notifiers. Any suggestions?

The comment [1] suggested that the panic is triggered during page aging.
vfio_pci_mmap() sets the VM_PFNMAP for the VMA and I see in the Documentation that
pages with VM_PFNMAP are added to the unevictable LRU list, doesn't that mean it's
not subject the page aging? I feel like there's something I'm missing.

[1]
https://lore.kernel.org/kvm/BY5PR12MB37642B9AC7E5D907F5A664F6B3459@BY5PR12MB3764.namprd12.prod.outlook.com/

Thanks,

Alex

On 4/26/21 11:36 AM, Marc Zyngier wrote:
> Sumit Gupta and Krishna Reddy both reported that for MMIO regions
> mapped into userspace using VFIO, a PTE update can trigger a MMU
> notifier reaching kvm_set_spte_hva().
>
> There is an assumption baked in kvm_set_spte_hva() that it only
> deals with memory pages, and not MMIO. For this purpose, it
> performs a cache cleaning of the potentially newly mapped page.
> However, for a MMIO range, this explodes as there is no linear
> mapping for this range (and doing cache maintenance on it would
> make little sense anyway).
>
> Check for the validity of the page before performing the CMO
> addresses the problem.
>
> Reported-by: Krishna Reddy <vdumpa at nvidia.com>
> Reported-by: Sumit Gupta <sumitg at nvidia.com>,
> Tested-by: Sumit Gupta <sumitg at nvidia.com>,
> Signed-off-by: Marc Zyngier <maz at kernel.org>
> Link: https://lore.kernel.org/r/5a8825bc-286e-b316-515f-3bd3c9c70a80@nvidia.com
> ---
>  arch/arm64/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index cd4d51ae3d4a..564a0f7fcd05 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1236,7 +1236,8 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  	 * We've moved a page around, probably through CoW, so let's treat it
>  	 * just like a translation fault and clean the cache to the PoC.
>  	 */
> -	clean_dcache_guest_page(pfn, PAGE_SIZE);
> +	if (!kvm_is_device_pfn(pfn))
> +		clean_dcache_guest_page(pfn, PAGE_SIZE);
>  	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>  	return 0;
>  }