[RFC] ath10k: Attempt to work around napi_synchronize hang.
Ben Greear
greearb at candelatech.com
Wed Feb 28 10:05:04 PST 2018
On 02/28/2018 09:31 AM, Michał Kazior wrote:
> On 28 February 2018 at 02:22, <greearb at candelatech.com> wrote:
> [...]
>> @@ -2086,8 +2087,28 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
>> ath10k_pci_irq_disable(ar);
>> ath10k_pci_irq_sync(ar);
>> ath10k_pci_flush(ar);
>> - napi_synchronize(&ar->napi);
>> - napi_disable(&ar->napi);
>> +
>> + /* Calling napi_disable twice in a row (w/out starting it and/or without
>> + * having NAPI active leads to deadlock because napi_disable sets
>> + * NAPI_STATE_SCHED and NAPI_STATE_NPSVC when it returns, as far as I
>> + * can tell. So, guard this call to napi_disable. I believe the
>> + * failure case is something like this:
>> + * rmmod ath10k_pci ath10k_core
>> + * Firmware crashes before hif_stop is called by the rmmod path
>> + * The crash handling logic calls hif_stop
>> + * Then rmmod gets around to calling hif_stop, but spins endlessly
>> + * in napi_synchronize.
>> + *
>> + * I think one way this could happen is that ath10k_stop checks
>> + * for state != ATH10K_STATE_OFF, but STATE_RESTARTING is also
>> + * a possibility. That might be how we can have hif_stop called twice
>> + * without a hif_start in between. --Ben
>> + */
>> + if (ar->napi_enabled) {
>> + napi_synchronize(&ar->napi);
>> + napi_disable(&ar->napi);
>> + ar->napi_enabled = false;
>> + }
>
> Looking at the code and your comment the described fail case seems
> legit. I would consider tuning ath10k_stop() so that it calls
> ath10k_halt() only if ar->state != OFF & ar->state != RESTARTING
> though. Or did you try already?
I did not try tuning ath10k_stop(). The code in this area is quite complex,
and in my opinion trying to keep the start/stop calls exactly matched without
individual 'has_started' flags seems ripe for bugs.
> While your approach will probably work it won't prevent other non-NAPI
> bad things from happening. Even if there's nothing today it might
> creep up in the future. And you'd need to update ahb.c too.
I'll update ahb.c to match.
Thanks,
Ben
>
>
> Michał
>
--
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc http://www.candelatech.com
More information about the ath10k
mailing list