[RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

Jason Gunthorpe jgunthorpe at obsidianresearch.com
Tue Apr 18 11:00:20 PDT 2017


On Tue, Apr 18, 2017 at 10:27:47AM -0700, Dan Williams wrote:
> > FWIW, RDMA probably wouldn't want to use a p2mem device either, we
> > already have APIs that map BAR memory to user space, and would like to
> > keep using them. A 'enable P2P for bar' helper function sounds better
> > to me.
> 
> ...and I think it's not a helper function as much as asking the bus
> provider "can these two device dma to each other".

What I mean I could write in a RDMA driver:

  /* Allow the memory in BAR 1 to be the target of P2P transactions */
  pci_enable_p2p_bar(dev, 1);

And not require anything else..

> The "helper" is the dma api redirecting through a software-iommu
> that handles bus address translation differently than it would
> handle host memory dma mapping.

Not sure, until we see what arches actually need to do here it is hard
to design common helpers.

Here are a few obvious things that arches will need to implement to
support this broadly:

- Virtualization might need to do a hypervisor call to get the right
  translation, or consult some hypervisor specific description table.

- Anything using IOMMUs for virtualization will need to setup IOMMU
  permissions to allow the P2P flow, this might require translation to
  an address cookie.

- Fail if the PCI devices are in different domains, or setup hardware to
  do completion bus/device/function translation.

- All platforms can succeed if the PCI devices are under the same
  'segment', but where segments begin is somewhat platform specific
  knowledge. (this is 'same switch' idea Logan has talked about)

So, we can eventually design helpers for various common scenarios, but
until we see what arch code actually needs to do it seems
premature. Much of this seems to involve interaction with some kind of
hardware, or consulation of some kind of currently platform specific
data, so I'm not sure what a software-iommu would be doing??

The main thing to agree on is that this code belongs under dma ops and
that arches have to support struct page mapped BAR addresses in their
dma ops inputs. Is that resonable?

Jason



More information about the Linux-nvme mailing list