[RFT] ath10k: restart fw on tx-credit timeout

Wed Feb 11 22:55:18 PST 2015

On 11 February 2015 at 23:25, Ben Greear <greearb at candelatech.com> wrote:
> On 02/10/2015 09:01 AM, Ben Greear wrote:
>
>> I've hacked CT firmware to do a flush of all vdevs itself when it detects WMI hang.
>> I don't have a good test bed to reproduce the problem reliably, but I should know
>> after a few days if the flush works at all.  If not, then it's a moot point anyway.
>
> So, this appears to at least partially work.
>
> But, what we notice is that when using multiple station vdevs, the system pretty much
> becomes useless if we get any significant number of stuck or slow-to-transmit management
> buffers over WMI.  Part of this is because WMI messages are sent when holding rtnl
> much of the time, I think.

Most, if not all, WMI commands are sent while holding conf_mutex. This
lock is taken in many situations including when RTNL is held so your
observation isn't entirely correct but isn't wrong either.

> I would guess that an AP with lots of peers associated might have similar problems
> if peers are not ACKing packets reliably.

It's not the ACKing per se. It's whether stations are asleep and
unresponsive or not. You could do funny DoS attacks with a single
ath9k card (using virtual stations) on ath10k APs now I guess :-)

> Probably the only useful way to fix this is to make the firmware and driver able to
> send management frames over the normal transport like every other data packet?

Agreed. HTT should've been used for entire traffic, including management frames.

The workaround could've been to guarantee to have only 1 wmi-mgmt-tx
in-flight but since tx-credits aren't replenished predictably you'll
end up with the patch I originally did, i.e. sleep 2*bcn intval and
wmi-peer-flush-tids after each unicast mgmt frame to a known station.

> Any idea what it wasn't written like that to begin with?

Beats me.

Michał