[PATCH v2] ath10k: Retry pci probe on failure.

Ben Greear greearb at candelatech.com
Fri Oct 13 13:41:15 PDT 2017


On 10/13/2017 08:50 AM, Adrian Chadd wrote:
> On 13 October 2017 at 05:41, Kalle Valo <kvalo at qca.qualcomm.com> wrote:
>> greearb at candelatech.com writes:
>>
>>> From: Ben Greear <greearb at candelatech.com>
>>>
>>> This works around a problem we see when sometimes the wifi NIC does
>>> not respond the first time.  This seems to happen especially often on
>>> some of the 9984 NICs in mid-range platforms.
>>>
>>> Signed-off-by: Ben Greear <greearb at candelatech.com>
>>
>> [...]
>>
>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>> -                         const struct pci_device_id *pci_dev)
>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>> +                           const struct pci_device_id *pci_dev)
>>>   {
>>>        int ret = 0;
>>>        struct ath10k *ar;
>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>        return ret;
>>>   }
>>>
>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>> +                         const struct pci_device_id *pci_dev)
>>> +{
>>> +     int cnt = 0;
>>> +     int rv;
>>> +     do {
>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>> +             if (rv == 0)
>>> +                     return rv;
>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>> +     } while (cnt++ < 10);
>>> +     return rv;
>>> +}
>>
>> This is a sledgehammer approach and it causes reload for all error
>> cases, like when hardware is broken or memory allocation is failing.
>>
>> When the problem happens does it always fail at the the same place? Is
>> it hw reset or something else? It's better to retry the invidiual action
>> than to do this hack. Or is it just some more delay needed somewhere?
>
> I am seeing WMI timeouts during initial firmware load and wait on
> QCA9984 + BCM7444S SoC.
> My guess is the WMI wakeup time is not "right" enough and needs to be
> extended a little bit.
>
> But then, I have played a lot of whackamole with WMI timeouts during
> my loooong porting effort..

The failure I saw was a failure to wake pci, and from comments, it seems that
the current wait is longer than what should be required, and it warns on slow
wakes, and I never saw that warning.  So I assume that waiting longer would not help.

I saw it fail twice in a row to wake pci and then succeed on the third try, for instance,
when testing my patch.

As for a big hammer, I guess we could check for certain return codes if you think
that is better than just retrying all failures?

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list