[PATCH v10] PCI: tango: Add MSI controller support

Mason slash.tmp at free.fr
Sat Aug 26 11:12:23 PDT 2017


On 26/08/2017 15:08, Marc Zyngier wrote:

> On Aug 25 2017 at 18:44, Mason wrote:
>
>> On 25/08/2017 17:45, Robin Murphy wrote:
>>
>>> On 25/08/17 16:35, Mason wrote:
>>>
>>>> On 25/08/2017 17:25, Robin Murphy wrote:
>>>>
>>>>> On 25/08/17 16:01, Mason wrote:
>>>>>
>>>>>> Robin wrote a prophetic post back in March:
>>>>>> http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/492965.html
>>>>>>
>>>>>>> The appropriate DT property would be "dma-ranges", i.e.
>>>>>>>
>>>>>>> pci at ... {
>>>>>>> 	...
>>>>>>> 	dma-ranges = <(PCI bus address) (CPU phys address) (size)>;
>>>>>>> }
>>>>>>
>>>>>> The dma-ranges property seems to be exactly what I'm looking for:
>>>>>>
>>>>>> Restrict DMA to the first X MB of RAM (use a bounce buffer
>>>>>> for other physical addresses).
>>>>>>
>>>>>> I added the following property to my PCIe node
>>>>>>
>>>>>>   dma-ranges = <0x0 0x80000000 0x80000000 0x20000000>;
>>>>>>
>>>>>> with the intent to create a 1:1 mapping for [0x80000000, 0xa0000000[
>>>>>>
>>>>>> But it does not work. Arg!
>>>>>>
>>>>>> My PCIe controller driver seems to be correctly calling of_dma_get_range:
>>>>>>
>>>>>> [ 0.520469] [<c03d85e8>] (of_dma_get_range) from [<c03d5ad8>] (of_dma_configure+0x48/0x234)
>>>>>> [ 0.520483] [<c03d5ad8>] (of_dma_configure) from [<c02fa154>] (pci_device_add+0xac/0x350)
>>>>>> [ 0.520493] [<c02fa154>] (pci_device_add) from [<c02fa488>] (pci_scan_single_device+0x90/0xb0)
>>>>>> [ 0.520501] [<c02fa488>] (pci_scan_single_device) from [<c02fa500>] (pci_scan_slot+0x58/0x100)
>>>>>> [ 0.520510] [<c02fa500>] (pci_scan_slot) from [<c02fb418>] (pci_scan_child_bus+0x20/0xf8)
>>>>>> [ 0.520519] [<c02fb418>] (pci_scan_child_bus) from [<c02fb6e8>] (pci_scan_root_bus_msi+0xcc/0xd8)
>>>>>> [ 0.520527] [<c02fb6e8>] (pci_scan_root_bus_msi) from> [<c02fb70c>] (pci_scan_root_bus+0x18/0x20)
>>>>>> [ 0.520537] [<c02fb70c>] (pci_scan_root_bus) from [<c0310544>] (pci_host_common_probe+0xc8/0x314)
>>>>>> [ 0.520546] [<c0310544>] (pci_host_common_probe) from [<c0310ce8>] (tango_pcie_probe+0x148/0x350)
>>>>>> [ 0.520557] [<c0310ce8>] (tango_pcie_probe) from [<c034d398>] (platform_drv_probe+0x34/0x6c)
>>>>>>
>>>>>> of_dma_get_range() is called on the pcie node (which is expected)
>>>>>> but after parsing n_addr_cells and n_size_cells in the while loop,
>>>>>> the code jumps to the parent node ("soc")... while my property is
>>>>>> attached to the pcie node...
>>>>>
>>>>> This is not your driver calling of_dma_get_range(), this is the PCI core
>>>>> doing so in the act of DMA master configuration for a discovered
>>>>> *endpoint*. The fact that the "pass the host controller's OF node
>>>>> because we don't have one for the endpoint" bodge only works properly
>>>>> for dma-coherent and not dma-ranges is a known, but irrelevant, problem.
>>>>>
>>>>> If your host controller driver needs to discover its windows from DT to
>>>>> configure *itself*, it needs to parse dma-ranges itself; see pcie-iproc,
>>>>> pcie-racar, pcie-xgene, etc. for examples.
>>>>
>>>> Yes, I'm aware that I need to do my own parsing of dma-ranges.
>>>> I can use that information to configure BAR0.base and the
>>>> region registers.
>>>>
>>>> But Linux needs to record my settings at some point, right?
>>>> Otherwise, how does the DMA framework know that devices can
>>>> only reach cpu addresses [0x80000000, 0xa0000000[ and when
>>>> to use bounce buffers?
>>>>
>>>> What's preventing the XHCI driver from allocating memory
>>>> outside of my "safe" range, and having the DMA framework
>>>> blindly map that?
>>>
>>> At the moment, nothing. Systems that have physical memory that is not
>>> visible in PCI mem space are having a bad time and will not go to space
>>> today.
>>>
>>> But that bears no relation to your MSI controller getting its doorbell
>>> address set appropriately.
>>
>> OK, so this is what I propose for v11 in order to not
>> hard code the MSI doorbell address (e.g. 0xa002e07c)
>>
>> I add the following property to the pcie node:
>>
>> 	dma-ranges = <0x0 0x80000000 0x80000000 0x20000000>;
>>
>> I.e. pci_addr = 0x80000000, cpu_addr = 0x80000000, len=0x20000000
>>
>> Then in the PCIe driver, I parse dma-ranges.
>>
>> Consequently
>>
>> 	MSI_doorbell_addr = cpu_addr + len + res.start + 0x7c
>>
>> Bjorn, Marc, Robin, is that an acceptable solution?
> 
> It seems to work, but I still have my doubts about this BAR0.base and
> the associated regions. Are these regions so hardcoded in HW that the RC
> cannot DMA outside of this 1GB region? Or can it be reconfigured by some
> SW agent to cover more RAM, should someone decide that 1GB is on the
> "too little" side?
> 
> If the former is true, the HW is remarkably busted and/or inflexible.

This HW block has already been deemed insane because
of the muxing of mem and config space... So you're
late to the party :-)

I wouldn't call the regions "hard-coded" since they are
software-configurable to point anywhere in the CPU bus.
(Although I'm not sure if that's any use, since the DMA
framework seems to expect a 1:1 mapping.)

But the other side (PCI bus) is quite inflexible:
accesses to addresses outside the window defined by BAR0
are silently ignored, no working around that :-(

> If the latter is true, then the dma-ranges property feels very fragile, as
> it must be kept in sync with the amount of memory that the system has.

I'm confused.

As I pointed out, the dma-ranges in the pcie node is
ignored by the DMA framework. And Robin confirmed that
"Systems that have physical memory that is not visible
in PCI mem space are having a bad time and will not go
to space today."

So there are several setups where something is bound
to break:

1) Linux manages more than 1 GB (contiguous)
=> because one region needs to point to the doorbell
area, so 128 MB are wasted.

2) Linux manages non-contiguous memory
=> e.g. 128MB at 0x80000000 + 128MB at 0xc0000000

That's why I've asked about bounce buffers.

The system I test on boots with mem=512MB

Regards.



More information about the linux-arm-kernel mailing list