[PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled

Sinan Kaya okaya at codeaurora.org
Tue Nov 21 08:43:32 PST 2017


On 11/21/2017 11:25 AM, David Laight wrote:
>> The DPC on the other hand stops the drivers immediately since HW took care of
>> link disable. (Endpoint register reads return ~0 at this point.)
> What happens if the 'user' driver doesn't define the error reporting callbacks?
> It might be hardened against the ~0u returns from reads - so not OOPS.
> It might be appropriate to call the remove() function instead.

This is what the DPC driver does in its interrupt handler. 

http://elixir.free-electrons.com/linux/latest/ident/interrupt_event_handler

My understanding is that this will eventually call the remove() function on the
endpoint driver eventually. 

Bjorn had concerns that we are not calling the error handler if registered and
then calling remove() callback while the driver is in the middle of something
could be bad. 

He had concerns if remove() would leave something in a bad state so recovery
would really not work at all and kernel crashes eventually due to data corruption. 

Oza and I are looking for a way to plumb DPC's error handling into AER driver
so that PCI framework has a single place to look for error handling.

for dpc:
1. If an error handler registered, call it for all children devices
2. Remove all children devices from the bus
3. Recover the link with DPC
4. Rescan the entire bus and install the drivers again

> 
>> DPC driver clears
>> the interrupt from the DPC capability and brings the link up at the end. Full
>> enumeration/rescan follows this procedure to go back to functioning state.
> That might not be a good idea, very likely it will fail again immediately.

We can add a policy parameter and not bring up the link if you want to do
troubleshooting at the point of failure or have a way to define how the
system response should be. 

DPC causes a hot reset on the bus. Endpoint should go to reset state and we should
be able to bring up the link without any problems under normal circumstances.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



More information about the linux-arm-kernel mailing list