[RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio

Wen Gong wgong at codeaurora.org
Thu Aug 20 22:45:20 EDT 2020


On 2020-08-21 04:59, Ben Greear wrote:
> On 8/20/20 1:15 PM, Krishna Chaitanya wrote:
>> On Thu, Aug 20, 2020 at 11:23 PM Ben Greear <greearb at candelatech.com> 
>> wrote:
>>> 
>>> On 8/20/20 10:42 AM, Krishna Chaitanya wrote:
>>>> On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
>>>> <chaitanya.mgit at gmail.com> wrote:
>>>>> 
>>>>> On Thu, Aug 20, 2020 at 10:38 PM Ben Greear 
>>>>> <greearb at candelatech.com> wrote:
>>>>>> 
>>>>>> On 8/20/20 10:00 AM, Krishna Chaitanya wrote:
>>>>>>> On Thu, Aug 20, 2020 at 10:02 PM Ben Greear 
>>>>>>> <greearb at candelatech.com> wrote:
>>>>>>>> 
>>>>>>>> On 8/20/20 9:08 AM, Krishna Chaitanya wrote:
>>>>>>>>> On Thu, Aug 20, 2020 at 8:07 PM Wen Gong <wgong at codeaurora.org> 
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> On 2020-08-20 18:52, Krishna Chaitanya wrote:
>>>>>>>>>>> On Thu, Aug 20, 2020 at 3:45 PM Wen Gong 
>>>>>>>>>>> <wgong at codeaurora.org> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2020-08-20 17:19, Krishna Chaitanya wrote:
>>>>>>>>>> ...
>>>>>>>>>>>>>> I'm not really convinced that this is the right fix, but 
>>>>>>>>>>>>>> I'm no NAPI
>>>>>>>>>>>>>> expert. Can anyone else help?
>>>>>>>>>>>>> Calling napi_disable() twice can lead to hangs, but moving 
>>>>>>>>>>>>> NAPI from
>>>>>>>>>>>>> start/stop to
>>>>>>>>>>>>> the probe isn't the right approach as the datapath is tied 
>>>>>>>>>>>>> to
>>>>>>>>>>>>> start/stop.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Maybe check the state of NAPI before disable?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>      if (test_bit(NAPI_STATE_SCHED, &ar->napi.napi.state))
>>>>>>>>>>>>>       napi_disable(&ar->napi)
>>>>>>>>>>>>> or maintain napi_state like this
>>>>>>>>>>>>> https://patchwork.kernel.org/patch/10249365/
>>>>>>>>>>>> it is better to use above link's patch.
>>>>>>>>>>>> napi.state is controlled by napi API, it is better ath10k 
>>>>>>>>>>>> not know it.
>>>>>>>>>>> Sure, but IMHO just canceling the async rx work should solve 
>>>>>>>>>>> the issue.
>>>>>>>>>> Oh no, canceling the async rx work will not solve this issue, 
>>>>>>>>>> rx worker
>>>>>>>>>> ath10k_rx_indication_async_work call napi_schedule, after 
>>>>>>>>>> napi_complete,
>>>>>>>>>> the NAPI_STATE_SCHED will clear.
>>>>>>>>>> The issue of this patch is because 2 thread called to hif_stop 
>>>>>>>>>> and
>>>>>>>>>> NAPI_STATE_SCHED not clear.
>>>>>>>>> That fix is still valid and good to have.
>>>>>>>>> 
>>>>>>>>> ndev_stop being called twice is typical scenarios (stop vs 
>>>>>>>>> rmmod), so
>>>>>>>>>      just checking the netdev_flags for IFF_UP and returning 
>>>>>>>>> from hif_Stop
>>>>>>>>> should suffice, no?
>>>>>>>> 
>>>>>>>> My approach to fix this problem was to add a boolean in ath10k 
>>>>>>>> as to whether
>>>>>>>> it had napi enabled or not, and then check that before trying to 
>>>>>>>> enable/disable
>>>>>>>> it again.  Seems to work fine, and cleaner in my mind than 
>>>>>>>> checking internal
>>>>>>>> napi flags.
>>>>>>> A much simpler approach is just to check for IFF_UP and skip NAPI 
>>>>>>> (and others)
>>>>>>> in the hif_stop no? (provided proper RTNL locking is done if 
>>>>>>> hif_stop
>>>>>>> is being called
>>>>>>> internally as well).
>>>>>>> 
>>>>>> 
>>>>>> I'm not sure, but I think the driver should be internally 
>>>>>> consistent and not
>>>>>> spend a lot of time trying to guess about interactions with 
>>>>>> objects higher
>>>>>> in the stack.
>>>>> Fair enough, the network interface state is a basic thing 
>>>>> controlled
>>>>> by the driver,
>>>>> so, should be okay to use. Anyways, the in-driver approach has more 
>>>>> control.
>>>>>> 
>>>>>> Here is my original patch to fix this, it is not complex.
>>>>>> 
>>>>>> https://patchwork.kernel.org/patch/10249363/
>>>>> Sure, I have shared your patch above :).
>>>> Sent a bit early, any idea why this wasn't upstreamed earlier?
>>> 
>>> No, one comment from Michal indicated maybe there were more problems 
>>> lurking
>>> in this area, but he seemed to be OK with the patch over all.  After 
>>> that,
>>> it was just ignored.
>>> 
>> Now might be a good time to push for it :)
>> 
> 
> It is generally a waste of time in my experience.  Kalle is the
> maintainer and should
> be seeing any of this he cares to see.  If he likes the patch, he can
> apply it or
> something similar.  If you have a reproducible test case, see if the 
> patch fixes
> things, that might help it be accepted.
I have 2 cmd, each one can reproduce the hang.
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep 
0.05;ifconfig wlan0 down
echo soft > 
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod 
ath10k_sdio
and with the my patch, it fix the hang. Change of my patch is similar 
with your
patch(https://patchwork.kernel.org/patch/10249365/), so it should also 
fix the hang with your patch.
> 
> Thanks,
> Ben



More information about the ath10k mailing list