[RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory
Benjamin Herrenschmidt
benh at kernel.crashing.org
Sat Apr 15 20:01:59 PDT 2017
On Sat, 2017-04-15 at 15:09 -0700, Dan Williams wrote:
> I'm wondering, since this is limited to support behind a single
> switch, if you could have a software-iommu hanging off that switch
> device object that knows how to catch and translate the non-zero
> offset bus address case. We have something like this with VMD driver,
> and I toyed with a soft pci bridge when trying to support AHCI+NVME
> bar remapping. When the dma api looks up the iommu for its device it
> hits this soft-iommu and that driver checks if the page is host memory
> or device memory to do the dma translation. You wouldn't need a bit in
> struct page, just a lookup to the hosting struct dev_pagemap in the
> is_zone_device_page() case and that can point you to p2p details.
I was thinking about a hook in the arch DMA ops but that kind of
wrapper might work instead indeed. However I'm not sure what's the best
way to "instantiate" it.
The main issue is that the DMA ops are a function of the initiator,
not the target (since the target is supposed to be memory) so things
are a bit awkward.
One (user ?) would have to know that a given device "intends" to DMA
directly to another device.
This is awkward because in the ideal scenario, this isn't something the
device knows. For example, one could want to have an existing NIC DMA
directly to/from NVME pages or GPU pages.
The NIC itself doesn't know the characteristic of these pages, but
*something* needs to insert itself in the DMA ops of that bridge to
make it possible.
That's why I wonder if it's the struct page of the target that should
be "marked" in such a way that the arch dma'ops can immediately catch
that they belong to a device and might require "wrapped" operations.
Are ZONE_DEVICE pages identifiable based on the struct page alone ? (a
flag ?)
That would allow us to keep a fast path for normal memory targets, but
also have some kind of way to handle the special cases of such peer 2
peer (or also handle other type of peer to peer that don't necessarily
involve PCI address wrangling but could require additional iommu bits).
Just thinking out loud ... I don't have a firm idea or a design. But
peer to peer is definitely a problem we need to tackle generically, the
demand for it keeps coming up.
Cheers,
Ben.
More information about the Linux-nvme
mailing list