[PATCH v3] ath10k: Fix crash during rmmod when probe firmware fails

Mohammed Shafi Shajakhan mohammed at codeaurora.org
Sun Feb 5 22:02:32 PST 2017


Hi Kalle,

sorry for the delay

On Wed, Jan 25, 2017 at 01:46:28PM +0000, Valo, Kalle wrote:
> Kalle Valo <kvalo at qca.qualcomm.com> writes:
> 
> > Mohammed Shafi Shajakhan <mohammed at qti.qualcomm.com> writes:
> >
> >> From: Mohammed Shafi Shajakhan <mohammed at qti.qualcomm.com>
> >>
> >> This fixes the below crash when ath10k probe firmware fails,
> >> NAPI polling tries to access a rx ring resource which was never
> >> allocated, fix this by disabling NAPI right away once the probe
> >> firmware fails by calling 'ath10k_hif_stop'. Its good to note
> >> that the error is never propogated to 'ath10k_pci_probe' when
> >> ath10k_core_register fails, so calling 'ath10k_hif_stop' to cleanup
> >> PCI related things seems to be ok
> >>
> >> BUG: unable to handle kernel NULL pointer dereference at (null)
> >> IP:  __ath10k_htt_rx_ring_fill_n+0x19/0x230 [ath10k_core]
> >> __ath10k_htt_rx_ring_fill_n+0x19/0x230 [ath10k_core]
> >>
> >> Call Trace:
> >>
> >> [<ffffffffa113ec62>] ath10k_htt_rx_msdu_buff_replenish+0x42/0x90
> >> [ath10k_core]
> >> [<ffffffffa113f393>] ath10k_htt_txrx_compl_task+0x433/0x17d0
> >> [ath10k_core]
> >> [<ffffffff8114406d>] ? __wake_up_common+0x4d/0x80
> >> [<ffffffff811349ec>] ? cpu_load_update+0xdc/0x150
> >> [<ffffffffa119301d>] ? ath10k_pci_read32+0xd/0x10 [ath10k_pci]
> >> [<ffffffffa1195b17>] ath10k_pci_napi_poll+0x47/0x110 [ath10k_pci]
> >> [<ffffffff817863af>] net_rx_action+0x20f/0x370
> >>
> >> Reported-by: Ben Greear <greearb at candelatech.com>
> >> Fixes: 3c97f5de1f28 ("ath10k: implement NAPI support")
> >> Signed-off-by: Mohammed Shafi Shajakhan <mohammed at qti.qualcomm.com>
> >
> > Is there an easy way to reproduce this bug? I don't see it on my x86
> > laptop with qca988x and I call rmmod all the time. I would like to test
> > this myself.
> >
> >> --- a/drivers/net/wireless/ath/ath10k/core.c
> >> +++ b/drivers/net/wireless/ath/ath10k/core.c
> >> @@ -2164,6 +2164,7 @@ static int ath10k_core_probe_fw(struct ath10k *ar)
> >>  	ath10k_core_free_firmware_files(ar);
> >>  
> >>  err_power_down:
> >> +	ath10k_hif_stop(ar);
> >>  	ath10k_hif_power_down(ar);
> >>  
> >>  	return ret;
> >
> > This breaks the symmetry, we should not be calling ath10k_hif_stop() if
> > we haven't called ath10k_hif_start() from the same function. This can
> > just create a bigger mess later, for example with other bus support like
> > sdio or usb. In theory it should enough that we call
> > ath10k_hif_power_down() and pci.c does the rest correctly "behind the
> > scenes".
> >
> > I investigated this a bit and I think the real cause is that we call
> > napi_enable() from ath10k_pci_hif_power_up() and napi_disable() from
> > ath10k_pci_hif_stop(). Does anyone remember why?
> >
> > I was expecting that we would call napi_enable()/napi_disable() either
> > in ath10k_hif_power_up/down() or ath10k_hif_start()/stop(), but not
> > mixed like it's currently.
> 
> So below is something I was thinking of, now napi_enable() is called
> from ath10k_hif_start() and napi_disable() from ath10k_hif_stop(). Would
> that work?
> 
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -1648,6 +1648,8 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
>  
>  	ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n");
>  
> +	napi_enable(&ar->napi);
> +
>  	ath10k_pci_irq_enable(ar);
>  	ath10k_pci_rx_post(ar);
>  
> @@ -2532,7 +2534,6 @@ static int ath10k_pci_hif_power_up(struct ath10k *ar)
>  		ath10k_err(ar, "could not wake up target CPU: %d\n", ret);
>  		goto err_ce;
>  	}
> -	napi_enable(&ar->napi);
>  
>  	return 0;
>

[shafi] I think I tried this change some time back, but it had some regression
during device start up, let me check this once and get back to you.

regards,
shafi



More information about the ath10k mailing list