[PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled

Sinan Kaya okaya at codeaurora.org
Thu Nov 16 06:03:37 PST 2017


Hi Bjorn,

On 11/15/2017 4:14 PM, Bjorn Helgaas wrote:
>> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
>> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
>> +		return;
>> +	}
> What happens without this test?
> 
> Does AER read registers from the now-disabled device and get ~0 data?
> Or is AER reading registers from the port upstream from the disabled
> device and trying to reset the device?
> 
> It looks like get_device_error_info() reads registers and doesn't
> check to see whether it gets ~0 back.  I'm wondering if we *should* be
> checking there and whether doing that would help mitigate the issue
> here.

The issue is two independent software entities are trying to recover the PCIe
link simultaneously. AER and DPC have two different approaches to link recovery.

AER makes a callback into the endpoint drivers for non-fatal errors and hope
that endpoint driver can recover the link. AER also makes a callback in the 
fatal error case but resets the link via secondary bus reset.

The DPC on the other hand stops the drivers immediately since HW took care of
link disable. (Endpoint register reads return ~0 at this point.) DPC driver clears
the interrupt from the DPC capability and brings the link up at the end. Full
enumeration/rescan follows this procedure to go back to functioning state. 

If we don't have this AER-DPC coordination, the endpoint driver gets confused since
it receives a stop command as well as a recover command at about the same time
depending on the timing.

Whether the AER driver reads ~0 or not really depends on timing. The link may come
up from the DPC driver by the time AER driver reaches here as an example.

Bad things do happen. We have seen this with e1000e driver.

Sinan

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



More information about the linux-arm-kernel mailing list