[PATCH v2] ath10k: Retry pci probe on failure.

Kalle Valo kvalo at qca.qualcomm.com
Tue Oct 17 01:45:55 PDT 2017


Ben Greear <greearb at candelatech.com> writes:

> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>> On 13 October 2017 at 05:41, Kalle Valo <kvalo at qca.qualcomm.com> wrote:
>>> greearb at candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb at candelatech.com>
>>>>
>>>> This works around a problem we see when sometimes the wifi NIC does
>>>> not respond the first time.  This seems to happen especially often on
>>>> some of the 9984 NICs in mid-range platforms.
>>>>
>>>> Signed-off-by: Ben Greear <greearb at candelatech.com>
>>>
>>> [...]
>>>
>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> -                         const struct pci_device_id *pci_dev)
>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                           const struct pci_device_id *pci_dev)
>>>>   {
>>>>        int ret = 0;
>>>>        struct ath10k *ar;
>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>>        return ret;
>>>>   }
>>>>
>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> +                         const struct pci_device_id *pci_dev)
>>>> +{
>>>> +     int cnt = 0;
>>>> +     int rv;
>>>> +     do {
>>>> +             rv = __ath10k_pci_probe(pdev, pci_dev);
>>>> +             if (rv == 0)
>>>> +                     return rv;
>>>> +             pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>> +             mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>> +     } while (cnt++ < 10);
>>>> +     return rv;
>>>> +}
>>>
>>> This is a sledgehammer approach and it causes reload for all error
>>> cases, like when hardware is broken or memory allocation is failing.
>>>
>>> When the problem happens does it always fail at the the same place? Is
>>> it hw reset or something else? It's better to retry the invidiual action
>>> than to do this hack. Or is it just some more delay needed somewhere?
>>
>> I am seeing WMI timeouts during initial firmware load and wait on
>> QCA9984 + BCM7444S SoC.
>> My guess is the WMI wakeup time is not "right" enough and needs to be
>> extended a little bit.
>>
>> But then, I have played a lot of whackamole with WMI timeouts during
>> my loooong porting effort..
>
> The failure I saw was a failure to wake pci, and from comments, it seems that
> the current wait is longer than what should be required, and it warns on slow
> wakes, and I never saw that warning.  So I assume that waiting longer would not help.
>
> I saw it fail twice in a row to wake pci and then succeed on the third
> try, for instance,
> when testing my patch.
>
> As for a big hammer, I guess we could check for certain return codes if you think
> that is better than just retrying all failures?

ath10k_pci_probe() has a lots of stuff which should not affect your
problem, like allocating memory, setting up timers and interrupts etc.
It's quite ugly to redo that in every cycle. A more fine grained
solution, like looping specific action (reset, wake whatever) is much
more preferred.

Do you have debug logs of failing cases?

-- 
Kalle Valo


More information about the ath10k mailing list