[PATCH 24/32] pci: PCIe driver for Marvell Armada 370/XP systems
Mitch Bradley
wmb at firmworks.com
Mon Mar 11 14:04:42 EDT 2013
On 3/10/2013 9:46 PM, Thierry Reding wrote:
> On Sun, Mar 10, 2013 at 07:46:04PM -1000, Mitch Bradley wrote:
>> On 3/9/2013 8:55 PM, Jason Gunthorpe wrote:
>>> On Sat, Mar 09, 2013 at 06:52:13PM -1000, Mitch Bradley wrote:
>>>
>>>> Okay, I think I finally get it. The Marvell root port bridge setup
>>>> registers look like standard config headers, even though they aren't
>>>> really in config space (because you need a different access method for
>>>> them, compared to real config accesses to downstream devices).
>>>
>>> Well, thats pretty close. On Marvell the MMIO space has a lot of
>>> stuff, some of it is config related, lots of it isn't. In fact most of
>>> it isn't. Tegra is different, the MMIO region is exactly the config
>>> space of the root port bridge.
>>>
>>> Tegra could probably happily and sanely do as you've described (well,
>>> see below), but it is wonky for Marvell. It looks like there are more
>>> drivers coming that have this need - so I had hoped for a general
>>> answer to the 'per-port MMIO space' problem that is unrelated to the
>>> MMIO spaces role in config access.
>>>
>>>> So, in order for the common code that enumerates the PCI bus to work,
>>>> you need to "spoof" the config accesses so that when you try to access
>>>> those particular "pseudo config headers", it uses the MMIO method
>>>> instead of the real config access method.
>>>
>>> There are rules for config access in both Tegra and Marvell that are
>>> not trivial. Again, the driver takes are of this and all Linux sees is
>>> a nice compliant config space.
>>
>> Thanks for explaining that.
>>
>> So it seems that we are faced with two requirements that are somewhat at
>> odds with one another.
>>
>> 1) Some of the root port bridge registers have to be accessed via config
>> space access functions so common PCI enumeration code will work. To
>> represent this with the usual DT structure, the top root-complex needs
>> to define a 3/2 address space so its children
>> can have standard PCI reg properties. Presumably, if those registers
>> are being programmed by common code, Marvell-specific code would
>> restrict its role to setting up config-space access functions, leaving
>> the actual touching of the registers to the common code.
>>
>> 2) Marvell chips have additional non-standard per-root-port registers
>> that generic PCI code would not understand. These registers would be
>> touched only by Marvell-specific code.
>>
>> The two kinds of registers are adjacent in MMIO space. However, unless
>> I am misunderstanding this MV78230 manual, the highest "config header"
>> register index is 0x134, well below the 0x1000 size limit of a PCIe
>> config header. Some of the extra registers are at 0x8xx, and others are
>> above 0x1800.
>>
>> That suggests the following:
>>
>> For the "config header" registers that use generic PCI code, use a
>> ranges entry to associate (pass through) MMIO addresses to the config
>> header portion of the register block. This parameterizes the code that
>> sets up the special-case PCI config access. That specification
>> technique is general and could be used not only for the Marvell case,
>> but also for any other chip that has some number of direct-mapped config
>> headers.
>>
>> For requirement (2), the top node has a reg property listing the
>> portions of the address space that are consumed by the driver at the top
>> level instead of being passed through to the PCI addressing domain. E.g.
>>
>> reg = <0xd0040800 0x1800>, <0xd0080800 0x1800>;
>>
>> Thus we have accurately described the two aspects of the true situation.
>> The PCI-compliant "config header" registers are passed through to the
>> child nodes where they can be dealt with in the normal PCI fashion. The
>> non-PCI-compliant register footprint lives within the Marvell-specific
>> root-complex driver.
>>
>> The root-complex driver presumably needs associate non-compliant
>> register blocks with specific child nodes. That can be done by
>> requiring that the reg entries are in the same order as the config-space
>> ranges entries.
>>
>> Anticipating the possible objection that ARM's 0x1000-byte page size
>> does not permit virtual-to-physical mapping at 0x800-byte granularity:
>> The device tree does not guarantee that reg entries are page-aligned; it
>> simply tries to describe the reality, even though it might be messy.
>>
>>>
>>>> If that is indeed the case, them I would vote for a slight modification
>>>> of the intermediate patch that I cited earlier - the one in which the
>>>> ranges property includes translations from those special config
>>>> addresses into CPU addresses. The modification is to fix the sizes,
>>>> changing 0x2000 to 0x800, so those ranges entries do not overlap in the
>>>> child address domain.
>>
>> BTW I have completely changed my mind about the overlap thing.
>>
>> I said that it was bogus to use size=0x2000 for a config space header.
>> That was based on an interpretation - which I now dislike - of the
>> meaning of 3-cell config addresses. By that old interpretation,
>> size=0x800 would also be bogus, because bits 10-8 of the config address
>> are for the function number.
>>
>> Consider the following question, which I have never previously
>> considered, at least not explicitly:
>>
>> Q: What would be the 3-cell representation of the Command/Status
>> register address (offset 4) in device 1, function 1?
>>
>> One obvious - but weak - answer is <0x00000904 0 0> . It's obvious
>> because the value 0x80000904 is what you put in the cf8 indirect access
>> address register. But it is weak because it unnecessarily restricts the
>> config header size, and because it works differently than offsets
>> applied to memory and I/O space addresses. Furthermore, coupling the
>> offset to the cf8 register is dodgy because of the funniness where you
>> have to but the byte-selector bits 1..0 not in the address register cf8,
>> but rather in the data register cfc.
>>
>> The better answer is <0x00000900 0 4>, using the third cell for the
>> offset, as with IO and memory space offsets.
>>
>> Under this better interpretation, config space sizes larger than 0x100
>> are no problem. The first cell is just a selector. The size is "added"
>> to the third cell, so it does not encroach into the first cell's
>> device,function bits.
>>
>> This better interpretation easily handles PCIe's 0x1000-byte extended
>> config headers, and trivially accomodates even larger sizes should a
>> future PCI variant further increase the header size. as described way
>> down below.
>
> That makes a lot of sense. It also mirrors parts of a discussion we
> previously had when we first discussed a DT binding for Tegra PCIe.
>
> Thanks for taking the time to go over this in so much detail. I very
> much appreciate it.
>
>>>> #address-cells = <3>;
>>>> #size-cells = <2>;
>>>>
>>>> bus-range = <0x00 0xff>;
>>>>
>>>> ranges = <0x00000800 0 0 0xd4004000 0 0x00000800 /* Root
>>>> port config header */
>>>> 0x00001000 0 0 0xd4008000 0 0x00000800 /* Root
>>>> port config header */
>>>> 0x82000000 0 0xe0000000 0xe0000000 0 0x08000000 /*
>>>> non-prefetchable memory */
>>>> 0x81000000 0 0 0xe8000000 0 0x00100000>; /*
>>>> downstream I/O */
>>>>
>>>> pcie at 1,0 {
>>>> device_type = "pci";
>>>> reg = <0x0800 0 0 0 0>;
>>>
>>> Okay, this was Thierry's original idea, and it was my note that this
>>> didn't seem like a good choice. I'm not wedded to that, but I'll
>>> explain my reasoning a bit.
>>>
>>> Two things bothered me
>>> - Describing a CPU MMIO mapping with a config address space seems
>>> wonky
>>
>> I agree that mapping config space is sort of a jarring concept, but I
>> think that's because PCs have polluted the mindspace, not because there
>> is anything inherently bad about it. The original PCI spec
>> fundamentally treated config space as just another (rather segmented)
>> linear address space. It specified the cf8/cfc indirect access
>> mechanism not as a deep semantic feature, but rather as an
>> implementation hack to work around addressing limitations of PC chipsets
>> and the addressing-deficient popular OS of the time (Windows 3.1 which
>> was a veneer over DOS). The PCI spec defined configuration mechanism #2
>> as a direct map into PC I/O space. RISC chipsets of that era (e.g.
>> Alpha, Power PC) often direct-mapped config space into some form of
>> load/store space.
>>
>> But PCs soon encroached on the RISC market and RISC system vendors
>> started to "leverage" PC I/O chipsets. The indirect access "config
>> mechanism #1" became part of the mindset. Indirect access was hardcoded
>> in enough software that chipset designers would provide indirect access
>> even if direct-mapping was possible.
>>
>> Fundamentally, a config header is just a block of registers, addressable
>> as an index relative to some base. It might require some specific
>> access path - but that is true with any block of registers. Even
>> direct-mapped MMIO registers often require specific semantics such as
>> non-cached, non-write-buffered, or special address space ids.
>>
>> The PCI SIG expected PCIe extended config space to be direct-mapped -
>> see slide 25 of
>>
>> http://www.pcisig.com/specifications/pciexpress/technical_library/dev_con_09_02/specifications/pciexpress/PCIExpress_SoftwareandConfigurationModel.pdf
>>
>> So mappability of config space was intended from the get-go. The real
>> wonky thing is the "indirect access zombie that cannot die".
>>
>>> - Linux's OF core doesn't parse a 'reg' property under
>>> 'device_type = "pci"' as something CPU mappable, it only looks
>>> to assigned-addresses for that. So the OF core would need to learn
>>> how to handle this case, and it would have to be well defined..
>>
>> The way I envisioned it working is that the root-complex node defines
>> its own config space access functions (the "ops" argument to
>> pci_scan_root_bus). For config addresses listed in ranges, the
>> functions do MMIO. For others, they use the chipset's indirect-access
>> registers. The OF core is not involved.
>>
>>>
>>> Thierry's original patch avoided this problem by not using
>>> device_type = "pci"..
>
> I think the OF core still needs to be involved in order to translate the
> reg = <0x0800 0 0 0 0>; entry into a resource that can be ioremap()'ed.
> As Jason already said, the OF core only does that for assigned-addresses
> when device_type = "pci".
>
> Otherwise the pci_ops implementation still couldn't access the MMIO
> registers.
I'm not sure I understand your point that <0x0800 ...> needs to be
translated into a resource.
(a) Existing code that needs to match a child nodes to a discovered PCI
hardware device uses the existing of_pci_find_child_device(parent,
devfn) from of_pci.c . That routine "just works", because the reg =
<0x0800 ..> entry is standard.
(b) The discovery/enumeration code needs to access config space via
pci_ops. The root complex driver implements pci_ops based on a trivial
parsing of ranges (omitting irrelevant details):
pci_op_read(devfn, pos) {
loop_over_ranges_entries {
if (to_devfn(ranges_entry.child) == devfn) {
return mmio_read(ranges_entry.parent + pos);
}
return indirect_read(devfn, pos);
}
The actual implementation might setup different data structures, but the
pseudocode captures the essential points: The input argument is devfn,
and a simple scan of ranges controls whether to use MMIO or indirect
access for a given devfn value. Every aspect of the scanning loop is
trivial. The loop itself is just a stride-5 walk over an array of integers.
"ranges_entry.parent" is be32_to_cpu(ranges_entry[3])
and
"to_devfn(ranges_entry.child)" is "be32_to_cpup(ranges_entry[0]) >> 8"
(you don't want to mask the result in this case because not masking
prevents non-config-space ranges entries from matching devfn.)
> Aiming for a driver-specific solution doesn't seem like a
> good idea but if the functionality is common enough to be required by
> two or more drivers perhaps a new helper could be created for exactly
> this purpose.
Indeed - but it is often best to wait and see rather than guessing about
what sort of common helper is needed. I usually guess wrong in such cases.
>
> Perhaps another alternative would be to extend the OF core to translate
> the entries in the reg property as well. Or maybe I misunderstood what
> you said.
>
> Thierry
>
More information about the linux-arm-kernel
mailing list