[PATCH v2] PCI: rockchip: Avoid accessing PCIe registers with clocks gated

Thu Jun 24 16:51:16 PDT 2021

On 2021-06-25 00:28, Bjorn Helgaas wrote:
> On Fri, Jun 25, 2021 at 12:18:48AM +0100, Robin Murphy wrote:
>> On 2021-06-24 22:57, Bjorn Helgaas wrote:
>>> On Tue, Jun 08, 2021 at 10:04:09AM +0200, Javier Martinez Canillas wrote:
>>>> IRQ handlers that are registered for shared interrupts can be called at
>>>> any time after have been registered using the request_irq() function.
>>>>
>>>> It's up to drivers to ensure that's always safe for these to be called.
>>>>
>>>> Both the "pcie-sys" and "pcie-client" interrupts are shared, but since
>>>> their handlers are registered very early in the probe function, an error
>>>> later can lead to these handlers being executed before all the required
>>>> resources have been properly setup.
>>>>
>>>> For example, the rockchip_pcie_read() function used by these IRQ handlers
>>>> expects that some PCIe clocks will already be enabled, otherwise trying
>>>> to access the PCIe registers causes the read to hang and never return.
>>>
>>> The read *never* completes?  That might be a bit problematic because
>>> it implies that we may not be able to recover from PCIe errors.  Most
>>> controllers will timeout eventually, log an error, and either
>>> fabricate some data (typically ~0) to complete the CPU's read or cause
>>> some kind of abort or machine check.
>>>
>>> Just asking in case there's some controller configuration that should
>>> be tweaked.
>>
>> If I'm following correctly, that'll be a read transaction to the native side
>> of the controller itself; it can't complete that read, or do anything else
>> either, because it's clock-gated, and thus completely oblivious (it might be
>> that if another CPU was able to enable the clocks then everything would
>> carry on as normal, or it might end up totally deadlocking the SoC
>> interconnect). I think it's safe to assume that in that state nothing of
>> importance would be happening on the PCIe side, and even if it was we'd
>> never get to know about it.
> 
> Oh, right, that makes sense.  I was thinking about the PCIe side, but
> if the controller itself isn't working, of course we wouldn't get that
> far.
> 
> I would expect that the CPU itself would have some kind of timeout for
> the read, but that's far outside of the PCI world.

Nah, in AMBA I'm not sure if it's even legal to abandon a transaction 
without waiting for the handshake to complete. If you're lucky the 
interconnect might have a clock/power domain bridge which can reply with 
an error when it knows its other side isn't running, otherwise the 
initiator will just happily sit there waiting for a response to come 
back "in a timely manner" :)

Robin.