[PATCH v5 04/12] PCI: brcmstb: add dma-range mapping for inbound traffic

Robin Murphy robin.murphy at arm.com
Wed Sep 26 03:56:56 PDT 2018

On 26/09/18 03:59, Florian Fainelli wrote:
> On 9/24/2018 8:01 AM, Jim Quinlan wrote:
>> On Mon, Sep 24, 2018 at 4:25 AM Ard Biesheuvel
>> <ard.biesheuvel at linaro.org> wrote:
>>> On Fri, 21 Sep 2018 at 19:41, Jim Quinlan <jim2101024 at gmail.com> wrote:
>>>> On Thu, Sep 20, 2018 at 5:39 PM Florian Fainelli 
>>>> <f.fainelli at gmail.com> wrote:
>>>>> On 09/20/2018 02:33 PM, Ard Biesheuvel wrote:
>>>>>> On 20 September 2018 at 14:31, Florian Fainelli 
>>>>>> <f.fainelli at gmail.com> wrote:
>>>>>>> On 09/20/2018 02:04 PM, Ard Biesheuvel wrote:
>>>>>>>> On 20 September 2018 at 13:55, Florian Fainelli 
>>>>>>>> <f.fainelli at gmail.com> wrote:
>>>>>>>>> On 09/19/2018 07:19 PM, Ard Biesheuvel wrote:
>>>>>>>>>> On 19 September 2018 at 07:31, Jim Quinlan 
>>>>>>>>>> <jim2101024 at gmail.com> wrote:
>>>>>>>>>>> The Broadcom STB PCIe host controller is intimately related 
>>>>>>>>>>> to the
>>>>>>>>>>> memory subsystem.  This close relationship adds complexity to 
>>>>>>>>>>> how cpu
>>>>>>>>>>> system memory is mapped to PCIe memory.  Ideally, this 
>>>>>>>>>>> mapping is an
>>>>>>>>>>> identity mapping, or an identity mapping off by a constant.  
>>>>>>>>>>> Not so in
>>>>>>>>>>> this case.
>>>>>>>>>>> Consider the Broadcom reference board BCM97445LCC_4X8 which 
>>>>>>>>>>> has 6 GB
>>>>>>>>>>> of system memory.  Here is how the PCIe controller maps the
>>>>>>>>>>> system memory to PCIe memory:
>>>>>>>>>>>    memc0-a@[        0....3fffffff] <=> pci@[        
>>>>>>>>>>> 0....3fffffff]
>>>>>>>>>>>    memc0-b@[100000000...13fffffff] <=> pci@[ 
>>>>>>>>>>> 40000000....7fffffff]
>>>>>>>>>>>    memc1-a@[ 40000000....7fffffff] <=> pci@[ 
>>>>>>>>>>> 80000000....bfffffff]
>>>>>>>>>>>    memc1-b@[300000000...33fffffff] <=> pci@[ 
>>>>>>>>>>> c0000000....ffffffff]
>>>>>>>>>>>    memc2-a@[ 80000000....bfffffff] <=> 
>>>>>>>>>>> pci@[100000000...13fffffff]
>>>>>>>>>>>    memc2-b@[c00000000...c3fffffff] <=> 
>>>>>>>>>>> pci@[140000000...17fffffff]
>>>>>>>>>> So is describing this as
>>>>>>>>>> dma-ranges = <0x0 0x0 0x0 0x0 0x0 0x40000000>,
>>>>>>>>>>               <0x0 0x40000000 0x1 0x0 0x0 0x40000000>,
>>>>>>>>>>               <0x0 0x80000000 0x0 0x40000000 0x0 0x40000000>,
>>>>>>>>>>               <0x0 0xc0000000 0x3 0x0 0x0 0x40000000>,
>>>>>>>>>>               <0x1 0x0 0x0 0x80000000 0x0 0x40000000>,
>>>>>>>>>>               <0x1 0x40000000 0x0 0xc0000000 0x0 0x40000000>;
>>>>>>>>>> not working for you? I haven't tried this myself, but since DT 
>>>>>>>>>> permits
>>>>>>>>>> describing the inbound mappings this way, we should fix the 
>>>>>>>>>> code if it
>>>>>>>>>> doesn't work at the moment.
>>>>>>>>> You mean encoding the memory controller index in the first 
>>>>>>>>> cell? If that
>>>>>>>>> works, that's indeed a much cleaner solution, though is it 
>>>>>>>>> standard
>>>>>>>>> compliant in any form?
>>>>>>>> No those are just memory addresses (although I may have screwed 
>>>>>>>> up the
>>>>>>>> order). From Documentation/devicetree/booting-without-of.txt:
>>>>>>>> """
>>>>>>>> Optional property:
>>>>>>>> - dma-ranges: <prop-encoded-array> encoded as arbitrary number 
>>>>>>>> of triplets of
>>>>>>>>          (child-bus-address, parent-bus-address, length). Each 
>>>>>>>> triplet specified
>>>>>>>>          describes a contiguous DMA address range.
>>>>>>>> """
>>>>>>> Then I am confused by your comment, that's what this patch does, 
>>>>>>> it adds
>>>>>>> support for reading "dma-ranges" from Device Tree and setting up 
>>>>>>> inbound
>>>>>>> windows using that. The only caveat is that because the PCIe root
>>>>>>> complex has some ties with the memory bus architecture it is 
>>>>>>> connected
>>>>>>> to (SCB in our case) there is still a requirement to know the
>>>>>>> translation between a given physical address and its backing memory
>>>>>>> controller/aperture.
>>>>>> Ah ok, apologies for the noise then.
>>>>>> I was hoping that having working support for dma-ranges would remove
>>>>>> the need for the special phys<->dma conversion routines.
>>>>> What you describe definitively works with platform devices, but I 
>>>>> am not
>>>>> sure this is working for PCIe devices, although, conceptually it 
>>>>> should,
>>>>> yes.
>>>> Sorry for my delay in responding.  One problem is that
>>>> of_dma_configure() only looks at the first dma-range given and then
>>>> converts it to dev->dma_pfn_offset which is respected by the DMA API.
>>>> However, we often have multiple dma-ranges, not just one.  This is the
>>>> big issue.
>>> Given the recent attention to getting these APIs in shape, this may be
>>> something Robin or Christoph may care to look into?
>> It looks like this has been brought up before in the "[RFC PATCH] of:
>> Fix DMA configuration for non-DT masters" thread aka
>> https://lists.linuxfoundation.org/pipermail/iommu/2017-April/021325.html
>> In the thread "Oza Oza", a Broadcom coworker probably dealing with the
>> same exact problem as I,  enumerates three problems.   #1 and #2 are
>> the exact same ones I've just given: the "dma-ranges" prop of the RC
>> DT node is "skipped", and of_dma_get_range() only considers the first
>> entry in any "dma-ranges".
> Robin, is that something that is expected or should the "dma-ranges" 
> somehow propagate from host bridge down the PCIe end-point drivers?

Nope, the code is most definitely incomplete - it's sufficient to 
support the system it was originally needed for (i.e. platform devices 
with a single range), but can by no means even pretend to support the 
binding fully. Furthermore, the way that PCI support was later grafted 
into of_dma_configure() was *only* in support of dma-coherent without 
consideration for dma-ranges. Hence the current mess.

>> Thanks, Jim
>>> In any case, the description of dma-ranges should be in sync with the
>>> way Linux interprets it, so this is either a documentation bug or a
>>> DMA layer bug.
>>>> There is another issue with of_dma_configure() being invoked by the EP
>>>> driver on "bridge->parent->of_node", which is our RC device,
>>>> Of_dma_configure() calls of_dma_range() on the of_get_next_parent() of
>>>> our RC's device node and this misses the dma-ranges property which is
>>>> contained within the RC.  I think I could workaround this but there is
>>>> no getting around the first problem.
>>> IIUC dma-ranges should be added to the parent bus of a device, which I
>>> guess is slightly ambiguous for a root complex that incorporates a
>>> host bridge.
> Humm, why is that ambiguous for a host bridge/root complex?

The real problem is that FDT machines don't describe the PCI hierarchy 
in DT as proper OF does, so we have this awkward crossing between the DT 
model and the Linux device model where the devices have no DT 
representation and the "parent bus" is a DT leaf node, which cocks up 
the way the current code is expecting to work.


More information about the linux-arm-kernel mailing list