Anyone seeing tx-credits 'hang'?

Ben Greear greearb at candelatech.com
Wed Jan 21 07:42:34 PST 2015



On 01/20/2015 11:22 PM, Michal Kazior wrote:
> On 20 January 2015 at 05:34, Ben Greear <greearb at candelatech.com> wrote:
>> Ok, so I think I've mostly got this figured out...at least enough to
>> work around the problem.
>>
>> It seems that the firmware and/or NIC hardware stops doing CE interrupts
>> for the WMI rings (at least).  If I force a poll of
>> the rings, then packets are found and may be processed.
>
> So you just keep calling ath10k_hif_send_complete_check() (or
> ath10k_ce_per_engine_service) for polling, right?

The polling is in firmware...but it is calling the firmware variants
of these.

I did actually add polling in the host as well, but that did not
fix the problem.  I will back that out and make sure the problem
remains fixed with just the firmware changes and host keep-alive
messages to enable the firmware changes.

>> In one case I looked at closely, it seems IRQs went away for around 30
>> seconds,
>> and then for no obvious reason IRQs for the rings started being delivered
>> and
>> processed again. ~20 WMI messages were processed due to polling CE rings in
>> this
>> interval.
>
> Out of curiosity - what irq mode are you using? Shared or MSI? Or did
> you try both?

Probably MSI, but I don't actually know.  Is there an easy way to tell?

>> The combination of WMI keep-alive messages sent from host, and
>> timer to check for timeouts (and do CE polling at higher intervals
>> when timeout is detected) appears to be enough.  I also check
>> for the IRQ working again and stop the polling at that time.
>>
>> I plan to clean the firmware changes up and commit them to my
>> own repo...but it will require host changes to enable the keep-alive
>> to fully work around this problem.  Probably none of this will make
>> it upstream....
>
> We could add a watchdog to WMI which uses the `echo` command and look
> at echo events and tx credit completion (WMI is notified about that).
> In case neither comes in in a timely fashion (lets say 1s which is
> less than WMI command timeout of 3s) we start polling until things
> settle down. This should work with standard firmware, no?

Since it is firmware that has to do the CE polling, then I don't see any
way to resolve this w/out hacking firmware..and you need a new message to
send to firmware from host that firmware can be sure is periodic to use
as it's WMI keep-alive timer.  That is why I made a new message type
for this (otherwise, cannot really be backwards compat with old kernels that
do not send regular keep-alives, but *may* send any other valid message type for
whatever reason whenever they want.)

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the ath10k mailing list