[OpenWrt-Devel] [PATCH] Fix 'Dropping frame due to full tx queue' for Ralink wifi get stuck.

Mikko Hissa mikko.hissa at werzek.com
Fri Sep 11 10:16:35 EDT 2015


> On 11 Sep 2015, at 15:43, N.Leiten <nickleiten at gmail.com> wrote:
> 
> In email dated Пятница - 11 сентября 2015 13:49:26 user Felix Fietkau wrote:
>> On 2015-09-11 13:33, N.Leiten wrote:
>>> Fix instability of Ralink WiFi general queue management on high load.
>>> rt2x00 driver logs in dmesg "Dropping frame due to full queue ..." several times and at some point get stuck.
>>> 
>>> Solutions in patch:
>>> 1) Increasing number of frames in each TX queue helps with speed and
>>> decreases queue overflows. Actually 256 frames can be increased to
>>> 512 (this number of frames used in proprietary drivers for every
>>> queue).
>> 512 frames seems to be overly excessive. Ever heard of bufferbloat?
>> It seems to me that the driver should simply call ieee80211_stop_queue
>> earlier to ensure that mac80211 does not attempt to fill the queues as
>> much. The driver queue length should probably be around 64 or less.
>> 
> 
> I agree. The patch was made for internal use on own hardware reference design in our company, so I had to look through proprietary and opensource drivers to see the difference. The point was to use recent versions of OpenWRT with mac80211 stack because proprietary still uses linux-2.6. And it makes porting full of pain in nails and bloody eyes reading that piece of code. For now mac80211 is preferrable.
> As to bufferbloat - yes I'm aware of it. Maybe I'm wrong in observations, but simply increasing queue num to 128 frames make driver more stable but not enough, so I started digging in ralink drivers for maximum value, tried it and it increased stability even more, but still hang at some point. So I decreased it to 256. Well, I'll try to remove this part and test again.

I think I solved the stuck tx-queue issue a few years ago by increasing the size of the txstatus fifo. Now looking at the code, I suggest you to try the following. 

in rt2800mmio.c:
	static void rt2800mmio_txstatus_interrupt(struct rt2x00_dev *rt2x00dev…
		I don’t know to what end was this added here but anyways remove the read limit, since it’s obvious that there might be more than just one tx-queue being used. Exiting the interrupt handler here with half full hw fifo (16 entries) will lead to lost txstatuses for sure:
		if (++i >= rt2x00dev->tx->limit)
                        break;

in rt2x00dev.c
	static int rt2x00lib_probe_hw(struct rt2x00_dev *rt2x00dev)…
		I used a static value here, but it seems that the hw might generate excess txstatuses (who knows why). So bigger is better!
		-int kfifo_size = roundup_pow_of_two(rt2x00dev->ops->tx_queues * rt2x00dev->tx->limit * sizeof(u32));
		+int kfifo_size = roundup_pow_of_two(2048 * sizeof(u32));

With these you should no longer be experiencing stuck tx-queues!

> 
>>> 2) Setting number of frames in TX/RX queues to equal values
>>> resolves async speed behaviour with better RX on AP-side (uplink from
>>> STAs), where it needs to be at least equal or better on TX queue on
>>> AP (download to STA).
>> That also doesn't make much sense to me as queueing behavior is
>> completely different between rx and tx. Rx queue processing speed is
>> determined by CPU speed, Tx queue processing speed is determined by
>> shared medium speed.
>> 
> 
> I agree, that was another step in digging. As I can understand it's better even use NAPI for RX queue. But I found only one mention of it in mac80211 sources, need to explore this.

The driver already uses tasklets and rx-interrupts stay disabled until the driver "catches up” with the rx-queue. So, there’s little to none performance gain here with napi. Although, I did see a noticeable performance increase by enabling delayed rx-interrupts from the hw. As if the hw is sending a BA every time when RX_CRX_IDX register is written? 

> I'll remove it, test again and make new patch. Can you direct me with this kind of behaviour? On rt3092 (2T2R) we can get 30-40Mbit/s download and 80-120Mbit/s upload speeds in separate tests, on 1T1R and 2T2R stations.
> 
>>> 3) In rt2x00mac.c additional check for queue
>>> full added and reassignment in this case, so interface will not drop
>>> frame.
>> Why? If the queues are full, it's better to just drop packets instead of
>> making the problem worse.
>> 
> 
> It was the simpliest solution at this point. I'll try to use ieee80211_stop_queue instead, but don't know how much time it'll consume.
> Maybe there's problem in "kick/pause" queue mechanism? I managed to explore behaviour after hang-point. It seems that only QID_AC_* queues stuck, station assoc/auth goes fine but actual data not transmitted on AP-side, but it receives DHCP-requests from station and it looks like no network access on hanged AP.

When the queue is stuck, you can see that the queue is filled to it’s threshold and it’s paused. So, the kick&pause works as it should.

> 
>>> 4) Fixes in queue initialization. Default values for AC_BK,
>>> AC_BE, AC_VI, AC_VO set from WMM.
>> Why do you hardcode that stuff inside the driver? What difference do
>> these values make?
>> 
>>> Tested on RT3883, RT5350, MT7620 SoCs and on RT3092 pcie interface for 10 days.
>>> 
>>> I'm planning to send this patch to mac80211 soon, but need to be sure that it works on other Ralink/Mediatek platforms and it's appropriate to do so.
>>> 
>>> 
>>> Signed-off-by: Nick Leiten <nickleiten at gmail.com>
>>> diff --git a/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch b/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch
>>> new file mode 100644
>>> index 0000000..9239bec
>>> --- /dev/null
>>> +++ b/package/kernel/mac80211/patches/999-rt2x00-queue-update.patch
>>> @@ -0,0 +1,142 @@
>>> +Only in compat-wireless-2015-03-09/drivers/net/wireless/rt2x00: limit
>>> +diff -c -r compat-wireless-2015-03-09-orig/drivers/net/wireless/rt2x00/rt2800mmio.c compat-wireless-2015-03-09/drivers/net/wireless/rt2x00/rt2800mmio.c
>>> +*** compat-wireless-2015-03-09-orig/drivers/net/wireless/rt2x00/rt2800mmio.c	2015-06-16 13:02:30.000000000 +0300
>>> +--- compat-wireless-2015-03-09/drivers/net/wireless/rt2x00/rt2800mmio.c	2015-09-04 11:50:09.665148666 +0300
>>> +***************
>>> +*** 700,706 ****
>>> +  
>>> +  	switch (queue->qid) {
>>> +  	case QID_RX:
>>> +! 		queue->limit = 128;
>>> +  		queue->data_size = AGGREGATION_SIZE;
>>> +  		queue->desc_size = RXD_DESC_SIZE;
>>> +  		queue->winfo_size = rxwi_size;
>>> +--- 700,706 ----
>>> +  
>>> +  	switch (queue->qid) {
>>> +  	case QID_RX:
>>> +! 		queue->limit = 256;
>>> +  		queue->data_size = AGGREGATION_SIZE;
>>> +  		queue->desc_size = RXD_DESC_SIZE;
>>> +  		queue->winfo_size = rxwi_size;
>>> +***************
>>> +*** 711,717 ****
>>> +  	case QID_AC_VI:
>>> +  	case QID_AC_BE:
>>> +  	case QID_AC_BK:
>>> +! 		queue->limit = 64;
>>> +  		queue->data_size = AGGREGATION_SIZE;
>>> +  		queue->desc_size = TXD_DESC_SIZE;
>>> +  		queue->winfo_size = txwi_size;
>>> +--- 711,717 ----
>>> +  	case QID_AC_VI:
>>> +  	case QID_AC_BE:
>>> +  	case QID_AC_BK:
>>> +! 		queue->limit = 256;
>>> +  		queue->data_size = AGGREGATION_SIZE;
>>> +  		queue->desc_size = TXD_DESC_SIZE;
>>> +  		queue->winfo_size = txwi_size;
>> Wrong patch style, please run make package/mac80211/refresh.
>> 
> Ok, will fix it.
>> - Felix
> _______________________________________________
> openwrt-devel mailing list
> openwrt-devel at lists.openwrt.org <mailto:openwrt-devel at lists.openwrt.org>
> https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel <https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.infradead.org/pipermail/openwrt-devel/attachments/20150911/a42f68ac/attachment.htm>
-------------- next part --------------
_______________________________________________
openwrt-devel mailing list
openwrt-devel at lists.openwrt.org
https://lists.openwrt.org/cgi-bin/mailman/listinfo/openwrt-devel


More information about the openwrt-devel mailing list