[RFC] ath10k: change to do napi_enable and napi_disable when insmod and rmmod for sdio
Wen Gong
wgong at codeaurora.org
Thu Aug 20 22:45:20 EDT 2020
On 2020-08-21 04:59, Ben Greear wrote:
> On 8/20/20 1:15 PM, Krishna Chaitanya wrote:
>> On Thu, Aug 20, 2020 at 11:23 PM Ben Greear <greearb at candelatech.com>
>> wrote:
>>>
>>> On 8/20/20 10:42 AM, Krishna Chaitanya wrote:
>>>> On Thu, Aug 20, 2020 at 11:11 PM Krishna Chaitanya
>>>> <chaitanya.mgit at gmail.com> wrote:
>>>>>
>>>>> On Thu, Aug 20, 2020 at 10:38 PM Ben Greear
>>>>> <greearb at candelatech.com> wrote:
>>>>>>
>>>>>> On 8/20/20 10:00 AM, Krishna Chaitanya wrote:
>>>>>>> On Thu, Aug 20, 2020 at 10:02 PM Ben Greear
>>>>>>> <greearb at candelatech.com> wrote:
>>>>>>>>
>>>>>>>> On 8/20/20 9:08 AM, Krishna Chaitanya wrote:
>>>>>>>>> On Thu, Aug 20, 2020 at 8:07 PM Wen Gong <wgong at codeaurora.org>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> On 2020-08-20 18:52, Krishna Chaitanya wrote:
>>>>>>>>>>> On Thu, Aug 20, 2020 at 3:45 PM Wen Gong
>>>>>>>>>>> <wgong at codeaurora.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 2020-08-20 17:19, Krishna Chaitanya wrote:
>>>>>>>>>> ...
>>>>>>>>>>>>>> I'm not really convinced that this is the right fix, but
>>>>>>>>>>>>>> I'm no NAPI
>>>>>>>>>>>>>> expert. Can anyone else help?
>>>>>>>>>>>>> Calling napi_disable() twice can lead to hangs, but moving
>>>>>>>>>>>>> NAPI from
>>>>>>>>>>>>> start/stop to
>>>>>>>>>>>>> the probe isn't the right approach as the datapath is tied
>>>>>>>>>>>>> to
>>>>>>>>>>>>> start/stop.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe check the state of NAPI before disable?
>>>>>>>>>>>>>
>>>>>>>>>>>>> if (test_bit(NAPI_STATE_SCHED, &ar->napi.napi.state))
>>>>>>>>>>>>> napi_disable(&ar->napi)
>>>>>>>>>>>>> or maintain napi_state like this
>>>>>>>>>>>>> https://patchwork.kernel.org/patch/10249365/
>>>>>>>>>>>> it is better to use above link's patch.
>>>>>>>>>>>> napi.state is controlled by napi API, it is better ath10k
>>>>>>>>>>>> not know it.
>>>>>>>>>>> Sure, but IMHO just canceling the async rx work should solve
>>>>>>>>>>> the issue.
>>>>>>>>>> Oh no, canceling the async rx work will not solve this issue,
>>>>>>>>>> rx worker
>>>>>>>>>> ath10k_rx_indication_async_work call napi_schedule, after
>>>>>>>>>> napi_complete,
>>>>>>>>>> the NAPI_STATE_SCHED will clear.
>>>>>>>>>> The issue of this patch is because 2 thread called to hif_stop
>>>>>>>>>> and
>>>>>>>>>> NAPI_STATE_SCHED not clear.
>>>>>>>>> That fix is still valid and good to have.
>>>>>>>>>
>>>>>>>>> ndev_stop being called twice is typical scenarios (stop vs
>>>>>>>>> rmmod), so
>>>>>>>>> just checking the netdev_flags for IFF_UP and returning
>>>>>>>>> from hif_Stop
>>>>>>>>> should suffice, no?
>>>>>>>>
>>>>>>>> My approach to fix this problem was to add a boolean in ath10k
>>>>>>>> as to whether
>>>>>>>> it had napi enabled or not, and then check that before trying to
>>>>>>>> enable/disable
>>>>>>>> it again. Seems to work fine, and cleaner in my mind than
>>>>>>>> checking internal
>>>>>>>> napi flags.
>>>>>>> A much simpler approach is just to check for IFF_UP and skip NAPI
>>>>>>> (and others)
>>>>>>> in the hif_stop no? (provided proper RTNL locking is done if
>>>>>>> hif_stop
>>>>>>> is being called
>>>>>>> internally as well).
>>>>>>>
>>>>>>
>>>>>> I'm not sure, but I think the driver should be internally
>>>>>> consistent and not
>>>>>> spend a lot of time trying to guess about interactions with
>>>>>> objects higher
>>>>>> in the stack.
>>>>> Fair enough, the network interface state is a basic thing
>>>>> controlled
>>>>> by the driver,
>>>>> so, should be okay to use. Anyways, the in-driver approach has more
>>>>> control.
>>>>>>
>>>>>> Here is my original patch to fix this, it is not complex.
>>>>>>
>>>>>> https://patchwork.kernel.org/patch/10249363/
>>>>> Sure, I have shared your patch above :).
>>>> Sent a bit early, any idea why this wasn't upstreamed earlier?
>>>
>>> No, one comment from Michal indicated maybe there were more problems
>>> lurking
>>> in this area, but he seemed to be OK with the patch over all. After
>>> that,
>>> it was just ignored.
>>>
>> Now might be a good time to push for it :)
>>
>
> It is generally a waste of time in my experience. Kalle is the
> maintainer and should
> be seeing any of this he cares to see. If he likes the patch, he can
> apply it or
> something similar. If you have a reproducible test case, see if the
> patch fixes
> things, that might help it be accepted.
I have 2 cmd, each one can reproduce the hang.
echo soft >
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;sleep
0.05;ifconfig wlan0 down
echo soft >
/sys/kernel/debug/ieee80211/phy0/ath10k/simulate_fw_crash;rmmod
ath10k_sdio
and with the my patch, it fix the hang. Change of my patch is similar
with your
patch(https://patchwork.kernel.org/patch/10249365/), so it should also
fix the hang with your patch.
>
> Thanks,
> Ben
More information about the ath10k
mailing list