[PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled

Bjorn Helgaas helgaas at kernel.org
Wed Nov 15 13:14:47 PST 2017


On Wed, Nov 15, 2017 at 10:26:48AM +0530, Oza Pawandeep wrote:
> PCI Express Base Specification, Rev. 4.0 Version 0.9
> 6.2.10: Downstream Port Containment (DPC)
> 
> DPC is an optional normative feature of a Downstream Port. DPC halts PCI
> Express traffic below a Downstream Port after an unmasked uncorrectable
> error is detected at or below the Port, avoiding the potential spread of
> any data corruption, and permitting error recovery if supported by
> software
> 
> Triggering DPC disables its Link by directing the LTSSM to the Disabled
> state. Once the LTSSM reaches the Disabled state, it remains in that
> state until the DPC Trigger Status bit is Cleared
> 
> So when DPC service is active and registered to port driver, AER should
> not attempt to recover, since DPC will be removing downstream devices,
> and do the recovery.
> 
> Signed-off-by: Oza Pawandeep <poza at codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 7448052..a9108ea 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -482,6 +482,27 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> + * pcie_port_query_uptream_service - query upstream service
> + * @dev: pointer to a pci_dev data structure of agent detecting an error
> + * @service: service to be queried
> + *
> + * Invoked to know the status of the service for pci device.
> + */
> +static bool pcie_port_query_uptream_service(struct pci_dev *dev, u32 service)
> +{
> +	struct pci_dev *upstream_dev = dev;
> +
> +	do {
> +		if (pcie_port_query_service(upstream_dev, service))
> +			return true;
> +		upstream_dev = pcie_port_upstream_bridge(upstream_dev);
> +	} while (upstream_dev);
> +
> +	return false;
> +}
> +
> +
> +/**
>   * do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
> @@ -495,6 +516,18 @@ static void do_recovery(struct pci_dev *dev, int severity)
>  	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
>  	enum pci_channel_state state;
>  
> +	/*
> +	 * If DPC is enabled, there is no need to attempt recovery.
> +	 * Since DPC disables its Link by directing the LTSSM to
> +	 * the Disabled state.
> +	 * DPC driver will take care of the recovery, there is no need
> +	 * for AER driver to race.
> +	 */
> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> +		return;
> +	}

What happens without this test?

Does AER read registers from the now-disabled device and get ~0 data?
Or is AER reading registers from the port upstream from the disabled
device and trying to reset the device?

It looks like get_device_error_info() reads registers and doesn't
check to see whether it gets ~0 back.  I'm wondering if we *should* be
checking there and whether doing that would help mitigate the issue
here.

I don't really like the pcie_port_query_uptream_service() approach
(BTW, the name is misspelled) because it feels a little ad hoc and it
makes assumptions here in the AER code about what the DPC code is
doing, e.g., how it has configured PCI_EXP_DPC_CTL.

>  	if (severity == AER_FATAL)
>  		state = pci_channel_io_frozen;
>  	else
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
> a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



More information about the linux-arm-kernel mailing list