Anyone seeing tx-credits 'hang'?

Ben Greear greearb at candelatech.com
Mon Jan 12 08:51:59 PST 2015


On 01/12/2015 12:06 AM, Michal Kazior wrote:
> On 9 January 2015 at 17:55, Ben Greear <greearb at candelatech.com> wrote:
> [...]
>> One thing I noticed yesterday is that when the driver tries to put a
>> vdev down, the firmware will try to flush, and will delay vdev-down
>> event until fw is flushed.  I changed CT firmware to automatically
>> flush in this case, but perhaps the driver should explicitly ask
>> firmware to flush the vdev before putting it down?
> 
> I recall the discussion we once had. I do plan on doing a patch for
> that, eventually.

I this case, I am thinking to just flush a particular vdev instead
of the entire set of vdevs.  I don't think flushing is root cause of
my problems anyway, as I still see the issue after making my CT
firmware flush.  I think upstream firmware might require one
message per tid per peer, so might be an issue to generate that
many wmi commands anyway...not sure.

>> Once the driver gets out of sync due to timeouts, the firmware
>> is likely to assert soon after if wmi hang doesn't happen because
>> firmware will think vdev is up when it is not, or vice versa.
>>
>> Also, I notice a pattern in the failure case.
>>
>> The sequence is almost always something like this:
>>
>> [lots of vdev up/down, re-associate, etc]
>>
>> vdev down (this would have timed out if I didn't put in the flush)
>>   * vdev down is usually last wmi cmd firmware receives.
>> driver tries to delete peer, that times out (firmware wmi layer never
>>   saw the command)
> 
> So there's a chance htc layer actually did get the buffer but for some
> reason it decided it isn't a wmi buffer. One reason could be the
> buffer contained garbage (e.g. due to missing barrier on host so
> firmware could read some data from an old physical address that was
> stored in ce descriptor item).
> 
> 
>> firmware reports one or two more messages to driver, and if it manages to report
>> a dbglog, that shows a tx-timeout message usually within a second of
>> the vdev down.  This happens whether or not I flush the vdev bringing it
>> down.
>>
>> At this point, one more request from driver may be sent, after that,
>> it is credit starvation.  Firmware continues to run (timers fire, etc).
>>
>> I think that firmware is also waiting on a completion event from the
>> CE layer...I plan to dig into that more today.
> 
> Hm.. This reminds me of issues hw1.0 had. I'd check if one of the
> workarounds ath10k had changes anything (see
> ath10k_ce_src_ring_write_index_set in ce.c in 5e3dd157ce).

Thanks, I'll go take a look at this today.

Ben



-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list