[PATCH v2] ath10k: Retry pci probe on failure.
Kalle Valo
kvalo at qca.qualcomm.com
Tue Oct 17 01:45:55 PDT 2017
Ben Greear <greearb at candelatech.com> writes:
> On 10/13/2017 08:50 AM, Adrian Chadd wrote:
>> On 13 October 2017 at 05:41, Kalle Valo <kvalo at qca.qualcomm.com> wrote:
>>> greearb at candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb at candelatech.com>
>>>>
>>>> This works around a problem we see when sometimes the wifi NIC does
>>>> not respond the first time. This seems to happen especially often on
>>>> some of the 9984 NICs in mid-range platforms.
>>>>
>>>> Signed-off-by: Ben Greear <greearb at candelatech.com>
>>>
>>> [...]
>>>
>>>> -static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> - const struct pci_device_id *pci_dev)
>>>> +static int __ath10k_pci_probe(struct pci_dev *pdev,
>>>> + const struct pci_device_id *pci_dev)
>>>> {
>>>> int ret = 0;
>>>> struct ath10k *ar;
>>>> @@ -3672,6 +3672,22 @@ static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> return ret;
>>>> }
>>>>
>>>> +static int ath10k_pci_probe(struct pci_dev *pdev,
>>>> + const struct pci_device_id *pci_dev)
>>>> +{
>>>> + int cnt = 0;
>>>> + int rv;
>>>> + do {
>>>> + rv = __ath10k_pci_probe(pdev, pci_dev);
>>>> + if (rv == 0)
>>>> + return rv;
>>>> + pr_err("ath10k: failed to probe PCI : %d, retry-count: %d\n", rv, cnt);
>>>> + mdelay(10); /* let the ath10k firmware gerbil take a small break */
>>>> + } while (cnt++ < 10);
>>>> + return rv;
>>>> +}
>>>
>>> This is a sledgehammer approach and it causes reload for all error
>>> cases, like when hardware is broken or memory allocation is failing.
>>>
>>> When the problem happens does it always fail at the the same place? Is
>>> it hw reset or something else? It's better to retry the invidiual action
>>> than to do this hack. Or is it just some more delay needed somewhere?
>>
>> I am seeing WMI timeouts during initial firmware load and wait on
>> QCA9984 + BCM7444S SoC.
>> My guess is the WMI wakeup time is not "right" enough and needs to be
>> extended a little bit.
>>
>> But then, I have played a lot of whackamole with WMI timeouts during
>> my loooong porting effort..
>
> The failure I saw was a failure to wake pci, and from comments, it seems that
> the current wait is longer than what should be required, and it warns on slow
> wakes, and I never saw that warning. So I assume that waiting longer would not help.
>
> I saw it fail twice in a row to wake pci and then succeed on the third
> try, for instance,
> when testing my patch.
>
> As for a big hammer, I guess we could check for certain return codes if you think
> that is better than just retrying all failures?
ath10k_pci_probe() has a lots of stuff which should not affect your
problem, like allocating memory, setting up timers and interrupts etc.
It's quite ugly to redo that in every cycle. A more fine grained
solution, like looping specific action (reset, wake whatever) is much
more preferred.
Do you have debug logs of failing cases?
--
Kalle Valo
More information about the ath10k
mailing list