[PATCH v2 04/21] ath10k: rate-limit packet tx errors
Ben Greear
greearb at candelatech.com
Thu Sep 15 08:22:37 PDT 2016
On 09/15/2016 06:59 AM, Valo, Kalle wrote:
> Ben Greear <greearb at candelatech.com> writes:
>
>> On 09/14/2016 07:07 AM, Valo, Kalle wrote:
>>> greearb at candelatech.com writes:
>>>
>>>> From: Ben Greear <greearb at candelatech.com>
>>>>
>>>> When firmware crashes, stack can continue to send packets
>>>> for a bit, and existing code was spamming logs.
>>>>
>>>> So, rate-limit the error message for tx failures.
>>>>
>>>> Signed-off-by: Ben Greear <greearb at candelatech.com>
>>>> ---
>>>> drivers/net/wireless/ath/ath10k/mac.c | 5 +++--
>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
>>>> index cd3016d..42cac32 100644
>>>> --- a/drivers/net/wireless/ath/ath10k/mac.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/mac.c
>>>> @@ -3432,8 +3432,9 @@ static int ath10k_mac_tx_submit(struct ath10k *ar,
>>>> }
>>>>
>>>> if (ret) {
>>>> - ath10k_warn(ar, "failed to transmit packet, dropping: %d\n",
>>>> - ret);
>>>> + if (net_ratelimit())
>>>> + ath10k_warn(ar, "failed to transmit packet, dropping: %d\n",
>>>> + ret);
>>>> ieee80211_free_txskb(ar->hw, skb);
>>>> }
>>>
>>> ath10k_warn() is already rate limited. If there's something wrong then
>>> that function should be fixed, not the callers.
>>>
>>> void ath10k_warn(struct ath10k *ar, const char *fmt, ...)
>>> {
>>> struct va_format vaf = {
>>> .fmt = fmt,
>>> };
>>> va_list args;
>>>
>>> va_start(args, fmt);
>>> vaf.va = &args;
>>> dev_warn_ratelimited(ar->dev, "%pV", &vaf);
>>> trace_ath10k_log_warn(ar, &vaf);
>>>
>>> va_end(args);
>>> }
>>
>> The problem with having the ratelimit here is that you may miss
>> rare warnings due to a flood of common warnings.
>>
>> That is why it is still useful to ratelimit potential floods
>> of warnings.
>
> I think this is a common problem in kernel, not specific to ath10k. For
> starters you could configure the limits dev_warn_ratelimited() has, not
> trying to workaround it in the driver.
I will try to explain this once more.
If you have the ratelimit in a centralized place, then all code that calls it
is rate-limitted with same counter and each call site gets the same priority.
One verbose caller can thus disable logs for the much more rare callers.
My patch pre-filters one of the verbose callers, which lets other more
rare and interesting callers be more likely to print logging messages
that are useful for debugging.
>> I would like to remove the ratelimit from ath10k_warn eventually.
>
> I think that's not a good idea, it might cause unnecessary host reboots
> in problem cases. Rate limitting the messages is much better option.
Ok, but even so, that would be a later patch and that is not a reason
to reject the one I posted.
For what it is worth, I and my users have been running such a patch for years
in various embedded and other systems and it works fine.
Thanks,
Ben
--
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc http://www.candelatech.com
More information about the ath10k
mailing list