Firmware crash when sending large numbers of forwarded packets

Mon Mar 10 06:10:51 EDT 2014

On 8 March 2014 09:20, Avery Pennarun <apenwarr at gmail.com> wrote:
> On Sat, Mar 8, 2014 at 2:03 AM, Kalle Valo <kvalo at qca.qualcomm.com> wrote:
>> Avery Pennarun <apenwarr at gmail.com> writes:
>>> I'm having a problem where if I transmit too fast out the ath10k
>>> interface in AP mode, I get a near-immediate firmware crash.
>>
>> [...]
>>
>>> Versions:
>>> - kernel is based on current kvalo/for-linville branch (should I try
>>> something else?) but seems to be the same in linux-next-20140114 so I
>>> don't think this behaviour has changed lately.
>>
>> I do not recommend using for-linville branch for anything. As the name
>> implies, it's only for John Linville to pull ath10k and ath6kl changes
>> to his tree.
>>
>> What I recommend is to use the master branch of my ath.git tree. That's
>> fairly recent wireless-testing (max 2 weeks old) plus latest ath10k +
>> ath6kl patches I have (ie. merge of wireless-testing and my ath-next
>> branch).
>
> Ok, thanks.  We're using a fairly old kernel on our device right now
> (3.2.26) so we're using the ath10k driver from linux-backports.  This
> means it's a little tricky to pick an arbitrary version if it has
> diverged to far from linux/master or linux-next.  I did try a few
> different versions though and they did the same thing.
>
>>> - firmware version 10.1.467.2-1, but also tested with 10.1.467.1-1
>>> with no difference.
>>>
>>> I assume other people are not experiencing this or they would have
>>> mentioned it by now.  What can I do to help debug this?
>>
>> We have reported the issue to the firmware team and got some feedback
>> already. Hopefully we know more early next week.
>
> Thanks!
>
> Another update.  On a whim, based on the earlier mention that problems
> might be related to extra burstiness of forwarding vs. local traffic
> generation, I decided to add a udelay() before transmitting each
> packet.  I started with udelay(1000) and the problem went away
> (although of course performance was terrible).  I slowly reduced the
> delay until I reached ndelay(1), and the problem stayed gone.  So I
> tried a mb() instead:
>
> diff --git a/drivers/net/wireless/ath/ath10k/ce.c
> b/drivers/net/wireless/ath/ath10k/ce.c
> index a79499c..a808d82 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.c
> +++ b/drivers/net/wireless/ath/ath10k/ce.c
> @@ -291,6 +291,7 @@ int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
>   if (ret)
>   return ret;
>
> + mb();
>   if (unlikely(CE_RING_DELTA(nentries_mask,
>     write_index, sw_index - 1) <= 0)) {
>   ret = -ENOSR;
> --
> 1.9.0.279.gdc9e3eb
>
>
> Somehow this eliminates my firmware crashes.  It's extremely reliable;
> add this line and my crashes go away.  Remove this line and my UDP
> iperf can crash the firmware in a couple of seconds.
>
> For this particular test I was using a backports built from linux
> v3.11.8 merged with your ath10k-stable-3.11-8 tag.
>
> Any idea why this would make any difference?

The FW dump is supposedly related to it seeing a duplicate msdu_id tx request.

ath10k fills in a tx descriptor. The descriptor contains an id which
is used for completion handling (FW signals which id completed).
ath10k uses a spinlock protected bitmap to manage this metadata.
Descriptors are alloced via dma pool (consistent dma memory).

It is highly unlikely for ath10k to pick duplicate msdu_id in the
first place - you'd have to assume spinlock fail which would suggest
your system would be pretty fun. This leaves either low level chunk
submission is at play or DMA goes crazy.

The descriptor is transfered in two chunks over CE ring. The
ce_send_nolock is used to submit each separately (via pci_tx_sg). The
first contains msdu id, the other one is the msdu partial as frame
prefetch for FW classification engine. Once the second chunk is
submitted CE ringbuffer index is written to iomap.

If I assume this is DMA coherency issue, then msdu_id the device sees
is the old one (that has been overwritten but hasn't been flushed from
CPU caches yet). Then this is a platform bug, not ath10k one.

If I assume this is chunk submission ordering issue (CE ring item is
updated _after_ ring index in iomap is updated) then the device uses
an old tx descriptor pointer and an old (or re-used and currently used
msdu_id -- remember all descriptors come from dma pool which I assume
re-uses memory chunks). Then this is ath10k bug.

The latter is a little more plausible because mb() fixes. udelay()
might implicitly do the same thing.

Michał