NOHZ tick-stop error with ath10k SDIO

Thomas Gleixner tglx at linutronix.de
Sun Sep 5 06:00:32 PDT 2021


Fabio,

On Sat, Sep 04 2021 at 18:10, Fabio Estevam wrote:
> On Fri, Sep 3, 2021 at 5:07 AM Thomas Gleixner <tglx at linutronix.de> wrote:
> I did as suggested and here is trace.txt:
> https://pastebin.com/VUfLRJ8a

Lacks a stack trace, but yes this one is the culprit:

kworker/u4:2-70      [000] d..1    87.940929: softirq_raise: vec=3 [action=NET_RX]

It has only interrupts and preemption disabled and it's in task
context. So if there is no interrupt raised and no local_bh_disable /
enable() pair invoked before the CPU goes idle nothing will handle the
softirq and the raised bit stays pending which makes the NOHZ idle code
complain.

> Also, while investigating this problem I saw a commit that fixed a
> similar issue:
> e63052a5dd3c ("mlx5e: add add missing BH locking around napi_schdule()").
>
> I then tried the same approach on the ath10k sdio driver:
>
> diff --git a/drivers/net/wireless/ath/ath10k/sdio.c
> b/drivers/net/wireless/ath/ath10k/sdio.c
> index b746052737e0..eb705214f3f0 100644
> --- a/drivers/net/wireless/ath/ath10k/sdio.c
> +++ b/drivers/net/wireless/ath/ath10k/sdio.c
> @@ -1363,8 +1363,11 @@ static void
> ath10k_rx_indication_async_work(struct work_struct *work)
>          ep->ep_ops.ep_rx_complete(ar, skb);
>      }
>
> -    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags))
> +    if (test_bit(ATH10K_FLAG_CORE_REGISTERED, &ar->dev_flags)) {
> +        local_bh_disable();
>          napi_schedule(&ar->napi);
> +        local_bh_enable();
> +    }
>  }
>
> and no longer get the "NOHZ tick-stop error: Non-RCU local softirq work is
> pending, handler #08!!!" error messages after launching hostapd.
>
> Is this a proper fix?

Yes. This is correct. See above.

Thanks,

        tglx



More information about the ath10k mailing list