[RFC 1/2] vfio/pci: keep the prefetchable attribute of a BAR region in VMA
Alex Williamson
alex.williamson at redhat.com
Thu Apr 29 19:28:40 BST 2021
On Thu, 29 Apr 2021 11:29:05 -0500
Shanker Donthineni <sdonthineni at nvidia.com> wrote:
> For pass-through device assignment, the ARM64 KVM hypervisor retrieves
> the memory region properties physical address, size, and whether a
> region backed with struct page or not from VMA. The prefetchable
> attribute of a BAR region isn't visible to KVM to make an optimal
> decision for stage2 attributes.
>
> This patch updates vma->vm_page_prot and maps with write-combine
> attribute if the associated BAR is prefetchable. For ARM64
> pgprot_writecombine() is mapped to memory-type MT_NORMAL_NC which
> has no side effects on reads and multiple writes can be combined.
>
> Signed-off-by: Shanker Donthineni <sdonthineni at nvidia.com>
> ---
> drivers/vfio/pci/vfio_pci.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 5023e23db3bc..1b734fe1dd51 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1703,7 +1703,11 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
> }
>
> vma->vm_private_data = vdev;
> - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> + if (IS_ENABLED(CONFIG_ARM64) &&
> + (pci_resource_flags(pdev, index) & IORESOURCE_PREFETCH))
> + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
> + else
> + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
If this were a valid thing to do, it should be done for all
architectures, not just ARM64. However, a prefetchable range only
necessarily allows merged writes, which seems like a subset of the
semantics implied by a WC attribute, therefore this doesn't seem
universally valid.
I'm also a bit confused by your problem statement that indicates that
without WC you're seeing unaligned accesses, does this suggest that
your driver is actually relying on WC semantics to perform merging to
achieve alignment? That seems rather like a driver bug, I'd expect UC
vs WC is largely a difference in performance, not a means to enforce
proper driver access patterns. Per the PCI spec, the bridge itself can
merge writes to prefetchable areas, presumably regardless of this
processor attribute, perhaps that's the feature your driver is relying
on that might be missing here. Thanks,
Alex
More information about the linux-arm-kernel
mailing list