[RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

Dan Williams dan.j.williams at intel.com
Tue Apr 18 15:28:17 PDT 2017


On Tue, Apr 18, 2017 at 3:15 PM, Logan Gunthorpe <logang at deltatee.com> wrote:
>
>
> On 18/04/17 03:36 PM, Dan Williams wrote:
>> On Tue, Apr 18, 2017 at 2:22 PM, Jason Gunthorpe
>> <jgunthorpe at obsidianresearch.com> wrote:
>>> On Tue, Apr 18, 2017 at 02:11:33PM -0700, Dan Williams wrote:
>>>>> I think this opens an even bigger can of worms..
>>>>
>>>> No, I don't think it does. You'd only shim when the target page is
>>>> backed by a device, not host memory, and you can figure this out by a
>>>> is_zone_device_page()-style lookup.
>>>
>>> The bigger can of worms is how do you meaningfully stack dma_ops.
>>
>> This goes back to my original comment to make this capability a
>> function of the pci bridge itself. The kernel has an implementation of
>> a dynamically created bridge device that injects its own dma_ops for
>> the devices behind the bridge. See vmd_setup_dma_ops() in
>> drivers/pci/host/vmd.c.
>
> Well the issue I think Jason is pointing out is that the ops don't
> stack. The map_* function in the injected dma_ops needs to be able to
> call the original map_* for any page that is not p2p memory. This is
> especially annoying in the map_sg function which may need to call a
> different op based on the contents of the sgl. (And please correct me if
> I'm not seeing how this can be done in the vmd example.)

Unlike the pci bus address offset case which I think is fundamental to
support since shipping archs do this today, I think it is ok to say
p2p is restricted to a single sgl that gets to talk to host memory or
a single device. That said, what's wrong with a p2p aware map_sg
implementation calling up to the host memory map_sg implementation on
a per sgl basis?

> Also, what happens if p2p pages end up getting passed to a device that
> doesn't have the injected dma_ops?

This goes back to limiting p2p to a single pci host bridge. If the p2p
capability is coordinated with the bridge rather than between the
individual devices then we have a central point to catch this case.

...of course this is all hand wavy until someone writes the code and
proves otherwise.

> However, the concept of replacing the dma_ops for all devices behind a
> supporting bridge is interesting and may be a good piece of the final
> solution.

It's at least a proof point for injecting special behavior for devices
behind a (virtual) pci bridge without needing to go touch a bunch of
drivers.



More information about the Linux-nvme mailing list