ath10k wake_tx_queue issues

Niklas Cassel niklas.cassel at linaro.org
Tue May 15 06:45:16 PDT 2018


Hello mac80211 and ath10k people

Using ath10k, TX stops working when running iperf

[  3]  0.0- 1.0 sec  2.00 MBytes  16.8 Mbits/sec
[  3]  1.0- 2.0 sec  3.12 MBytes  26.2 Mbits/sec
[  3]  2.0- 3.0 sec  3.25 MBytes  27.3 Mbits/sec
[  3]  3.0- 4.0 sec   655 KBytes  5.36 Mbits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.3 sec  9.01 MBytes  7.32 Mbits/sec

The problem can be reproduced without specifying a send buffer size,
however, specifying a small send buffer helps to reproduce the problem faster.

What happens is that iperf gets -EAGAIN on write().
It continues to get -EAGAIN, even if iperf runs for e.g. 300 seconds.
The reason why we get -EAGAIN is because the send socket buffer is full
(iperf uses non-blocking I/O).

The problem is that the mac80211 wake_tx_queue callback never comes.

I guess the best way to describe this is to show my ftrace buffer:

     ksoftirqd/2-21    [002] .ns4    74.711744: ath10k_htt_tx_dec_pending: num_pen: 60
     ksoftirqd/2-21    [002] .ns3    74.711761: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 60 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711765: ath10k_htt_tx_inc_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns4    74.711781: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711787: ath10k_htt_tx_dec_pending: num_pen: 61
     ksoftirqd/2-21    [002] .ns3    74.711803: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 61 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711807: ath10k_htt_tx_inc_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns4    74.711823: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711829: ath10k_htt_tx_dec_pending: num_pen: 62
     ksoftirqd/2-21    [002] .ns3    74.711845: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 62 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711849: ath10k_htt_tx_inc_pending: num_pen: 63
     ksoftirqd/2-21    [002] .ns4    74.711865: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711870: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711874: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711877: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711880: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711882: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711885: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711887: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711890: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711892: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711895: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711898: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711900: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711903: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711905: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711908: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711910: stop_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns4    74.711917: ath10k_htt_tx_dec_pending: num_pen: 63
     ksoftirqd/2-21    [002] dns5    74.711922: wake_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711929: wake_queue: phy0 queue:15, reason:0
     ksoftirqd/2-21    [002] .ns3    74.711948: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 63 queue: 0
     ksoftirqd/2-21    [002] .ns4    74.711952: ath10k_htt_tx_inc_pending: num_pen: 64
     ksoftirqd/2-21    [002] dns5    74.711956: stop_queue: phy0 queue:0, reason:0
     ksoftirqd/2-21    [002] dns5    74.711959: stop_queue: phy0 queue:1, reason:0
     ksoftirqd/2-21    [002] dns5    74.711962: stop_queue: phy0 queue:2, reason:0
     ksoftirqd/2-21    [002] dns5    74.711964: stop_queue: phy0 queue:3, reason:0
     ksoftirqd/2-21    [002] dns5    74.711967: stop_queue: phy0 queue:4, reason:0
     ksoftirqd/2-21    [002] dns5    74.711969: stop_queue: phy0 queue:5, reason:0
     ksoftirqd/2-21    [002] dns5    74.711972: stop_queue: phy0 queue:6, reason:0
     ksoftirqd/2-21    [002] dns5    74.711974: stop_queue: phy0 queue:7, reason:0
     ksoftirqd/2-21    [002] dns5    74.711977: stop_queue: phy0 queue:8, reason:0
     ksoftirqd/2-21    [002] dns5    74.711980: stop_queue: phy0 queue:9, reason:0
     ksoftirqd/2-21    [002] dns5    74.711982: stop_queue: phy0 queue:10, reason:0
     ksoftirqd/2-21    [002] dns5    74.711985: stop_queue: phy0 queue:11, reason:0
     ksoftirqd/2-21    [002] dns5    74.711987: stop_queue: phy0 queue:12, reason:0
     ksoftirqd/2-21    [002] dns5    74.711990: stop_queue: phy0 queue:13, reason:0
     ksoftirqd/2-21    [002] dns5    74.711992: stop_queue: phy0 queue:14, reason:0
     ksoftirqd/2-21    [002] dns5    74.711995: stop_queue: phy0 queue:15, reason:0

(since we just called ieee80211_stop_queues(), I wouldn't expect to see
wake_tx_queue being called directly after)

     ksoftirqd/2-21    [002] .ns3    74.712024: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 1 byte_cnt: 1534 f_txq: frame_cnt: 1 byte_cnt: 1534 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712040: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 2 byte_cnt: 3068 f_txq: frame_cnt: 2 byte_cnt: 3068 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712055: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 3 byte_cnt: 4602 f_txq: frame_cnt: 3 byte_cnt: 4602 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712069: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 4 byte_cnt: 6136 f_txq: frame_cnt: 4 byte_cnt: 6136 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712084: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 5 byte_cnt: 7670 f_txq: frame_cnt: 5 byte_cnt: 7670 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712099: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 6 byte_cnt: 9204 f_txq: frame_cnt: 6 byte_cnt: 9204 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712113: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 7 byte_cnt: 10738 f_txq: frame_cnt: 7 byte_cnt: 10738 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712128: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 8 byte_cnt: 12272 f_txq: frame_cnt: 8 byte_cnt: 12272 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712142: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 9 byte_cnt: 13806 f_txq: frame_cnt: 9 byte_cnt: 13806 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712157: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 10 byte_cnt: 15340 f_txq: frame_cnt: 10 byte_cnt: 15340 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712171: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 11 byte_cnt: 16874 f_txq: frame_cnt: 11 byte_cnt: 16874 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712186: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 12 byte_cnt: 18408 f_txq: frame_cnt: 12 byte_cnt: 18408 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712200: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 13 byte_cnt: 19942 f_txq: frame_cnt: 13 byte_cnt: 19942 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712215: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 14 byte_cnt: 21476 f_txq: frame_cnt: 14 byte_cnt: 21476 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712229: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 15 byte_cnt: 23010 f_txq: frame_cnt: 15 byte_cnt: 23010 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712244: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 16 byte_cnt: 24544 f_txq: frame_cnt: 16 byte_cnt: 24544 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712258: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 17 byte_cnt: 26078 f_txq: frame_cnt: 17 byte_cnt: 26078 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712273: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 18 byte_cnt: 27612 f_txq: frame_cnt: 18 byte_cnt: 27612 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712287: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 19 byte_cnt: 29146 f_txq: frame_cnt: 19 byte_cnt: 29146 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712302: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 20 byte_cnt: 30680 f_txq: frame_cnt: 20 byte_cnt: 30680 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712316: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 21 byte_cnt: 32214 f_txq: frame_cnt: 21 byte_cnt: 32214 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712330: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 22 byte_cnt: 33748 f_txq: frame_cnt: 22 byte_cnt: 33748 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712345: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 23 byte_cnt: 35282 f_txq: frame_cnt: 23 byte_cnt: 35282 num_pen: 64 queue: 0
     ksoftirqd/2-21    [002] .ns3    74.712359: ath10k_mac_op_wake_tx_queue: txq: frame_cnt: 24 byte_cnt: 36816 f_txq: frame_cnt: 24 byte_cnt: 36816 num_pen: 64 queue: 0
  ksdioirqd/mmc2-139   [002] ...1    74.712411: ath10k_htt_tx_dec_pending: num_pen: 63
  ksdioirqd/mmc2-139   [002] d..2    74.712417: wake_queue: phy0 queue:0, reason:0
  ksdioirqd/mmc2-139   [002] d..2    74.712424: wake_queue: phy0 queue:15, reason:0

here we just called ieee80211_wake_queue(), however, wake_tx_queue callback
(ath10k_mac_op_wake_tx_queue) is never seen again...

  ksdioirqd/mmc2-139   [002] ...1    74.712454: ath10k_htt_tx_dec_pending: num_pen: 62
  ksdioirqd/mmc2-139   [002] ...1    74.712468: ath10k_htt_tx_dec_pending: num_pen: 61
  ksdioirqd/mmc2-139   [002] ...1    74.718078: ath10k_htt_tx_dec_pending: num_pen: 60
  ksdioirqd/mmc2-139   [002] ...1    74.718103: ath10k_htt_tx_dec_pending: num_pen: 59
  ksdioirqd/mmc2-139   [002] ...1    74.718116: ath10k_htt_tx_dec_pending: num_pen: 58
  ksdioirqd/mmc2-139   [002] ...1    74.718131: ath10k_htt_tx_dec_pending: num_pen: 57
  ksdioirqd/mmc2-139   [002] ...1    74.718143: ath10k_htt_tx_dec_pending: num_pen: 56
  ksdioirqd/mmc2-139   [002] ...1    74.718155: ath10k_htt_tx_dec_pending: num_pen: 55
  ksdioirqd/mmc2-139   [002] ...1    74.718250: ath10k_htt_tx_dec_pending: num_pen: 54
  ksdioirqd/mmc2-139   [002] ...1    74.718273: ath10k_htt_tx_dec_pending: num_pen: 53
  ksdioirqd/mmc2-139   [002] ...1    74.718286: ath10k_htt_tx_dec_pending: num_pen: 52
  ksdioirqd/mmc2-139   [002] ...1    74.718298: ath10k_htt_tx_dec_pending: num_pen: 51
  ksdioirqd/mmc2-139   [002] ...1    74.718310: ath10k_htt_tx_dec_pending: num_pen: 50
  ksdioirqd/mmc2-139   [002] ...1    74.718322: ath10k_htt_tx_dec_pending: num_pen: 49
  ksdioirqd/mmc2-139   [002] ...1    74.718334: ath10k_htt_tx_dec_pending: num_pen: 48
  ksdioirqd/mmc2-139   [002] ...1    74.718346: ath10k_htt_tx_dec_pending: num_pen: 47
  ksdioirqd/mmc2-139   [002] ...1    74.718622: ath10k_htt_tx_dec_pending: num_pen: 46
  ksdioirqd/mmc2-139   [002] ...1    74.718647: ath10k_htt_tx_dec_pending: num_pen: 45
  ksdioirqd/mmc2-139   [002] ...1    74.718660: ath10k_htt_tx_dec_pending: num_pen: 44
  ksdioirqd/mmc2-139   [002] ...1    74.719050: ath10k_htt_tx_dec_pending: num_pen: 43
  ksdioirqd/mmc2-139   [002] ...1    74.719154: ath10k_htt_tx_dec_pending: num_pen: 42
  ksdioirqd/mmc2-139   [002] ...1    74.719255: ath10k_htt_tx_dec_pending: num_pen: 41
  ksdioirqd/mmc2-139   [002] ...1    74.719278: ath10k_htt_tx_dec_pending: num_pen: 40
  ksdioirqd/mmc2-139   [002] ...1    74.719371: ath10k_htt_tx_dec_pending: num_pen: 39
  ksdioirqd/mmc2-139   [002] ...1    74.719394: ath10k_htt_tx_dec_pending: num_pen: 38
  ksdioirqd/mmc2-139   [002] ...1    74.719493: ath10k_htt_tx_dec_pending: num_pen: 37
  ksdioirqd/mmc2-139   [002] ...1    74.719516: ath10k_htt_tx_dec_pending: num_pen: 36
  ksdioirqd/mmc2-139   [002] ...1    74.719529: ath10k_htt_tx_dec_pending: num_pen: 35
  ksdioirqd/mmc2-139   [002] ...1    74.719541: ath10k_htt_tx_dec_pending: num_pen: 34
  ksdioirqd/mmc2-139   [002] ...1    74.719554: ath10k_htt_tx_dec_pending: num_pen: 33
  ksdioirqd/mmc2-139   [002] ...1    74.719566: ath10k_htt_tx_dec_pending: num_pen: 32
  ksdioirqd/mmc2-139   [002] ...1    74.719658: ath10k_htt_tx_dec_pending: num_pen: 31
  ksdioirqd/mmc2-139   [002] ...1    74.719681: ath10k_htt_tx_dec_pending: num_pen: 30
  ksdioirqd/mmc2-139   [002] ...1    74.719694: ath10k_htt_tx_dec_pending: num_pen: 29
  ksdioirqd/mmc2-139   [002] ...1    74.719708: ath10k_htt_tx_dec_pending: num_pen: 28
  ksdioirqd/mmc2-139   [002] ...1    74.719803: ath10k_htt_tx_dec_pending: num_pen: 27
  ksdioirqd/mmc2-139   [002] ...1    74.719826: ath10k_htt_tx_dec_pending: num_pen: 26
  ksdioirqd/mmc2-139   [002] ...1    74.719839: ath10k_htt_tx_dec_pending: num_pen: 25
  ksdioirqd/mmc2-139   [002] ...1    74.719851: ath10k_htt_tx_dec_pending: num_pen: 24
  ksdioirqd/mmc2-139   [002] ...1    74.719864: ath10k_htt_tx_dec_pending: num_pen: 23
  ksdioirqd/mmc2-139   [002] ...1    74.719956: ath10k_htt_tx_dec_pending: num_pen: 22
  ksdioirqd/mmc2-139   [002] ...1    74.719978: ath10k_htt_tx_dec_pending: num_pen: 21
  ksdioirqd/mmc2-139   [002] ...1    74.719992: ath10k_htt_tx_dec_pending: num_pen: 20
  ksdioirqd/mmc2-139   [002] ...1    74.720004: ath10k_htt_tx_dec_pending: num_pen: 19
  ksdioirqd/mmc2-139   [002] ...1    74.720096: ath10k_htt_tx_dec_pending: num_pen: 18
  ksdioirqd/mmc2-139   [002] ...1    74.720137: ath10k_htt_tx_dec_pending: num_pen: 17
  ksdioirqd/mmc2-139   [002] ...1    74.720150: ath10k_htt_tx_dec_pending: num_pen: 16
  ksdioirqd/mmc2-139   [002] ...1    74.720163: ath10k_htt_tx_dec_pending: num_pen: 15
  ksdioirqd/mmc2-139   [002] ...1    74.720256: ath10k_htt_tx_dec_pending: num_pen: 14
  ksdioirqd/mmc2-139   [002] ...1    74.720279: ath10k_htt_tx_dec_pending: num_pen: 13
  ksdioirqd/mmc2-139   [002] ...1    74.720292: ath10k_htt_tx_dec_pending: num_pen: 12
  ksdioirqd/mmc2-139   [002] ...1    74.720304: ath10k_htt_tx_dec_pending: num_pen: 11
  ksdioirqd/mmc2-139   [002] ...1    74.720316: ath10k_htt_tx_dec_pending: num_pen: 10
  ksdioirqd/mmc2-139   [002] ...1    74.720396: ath10k_htt_tx_dec_pending: num_pen: 9
  ksdioirqd/mmc2-139   [002] ...1    74.720503: ath10k_htt_tx_dec_pending: num_pen: 8
  ksdioirqd/mmc2-139   [002] ...1    74.720527: ath10k_htt_tx_dec_pending: num_pen: 7
  ksdioirqd/mmc2-139   [002] ...1    74.720540: ath10k_htt_tx_dec_pending: num_pen: 6
  ksdioirqd/mmc2-139   [002] ...1    74.720552: ath10k_htt_tx_dec_pending: num_pen: 5
  ksdioirqd/mmc2-139   [002] ...1    74.720645: ath10k_htt_tx_dec_pending: num_pen: 4
  ksdioirqd/mmc2-139   [002] ...1    74.720668: ath10k_htt_tx_dec_pending: num_pen: 3
  ksdioirqd/mmc2-139   [002] ...1    74.720681: ath10k_htt_tx_dec_pending: num_pen: 2
  ksdioirqd/mmc2-139   [002] ...1    74.720694: ath10k_htt_tx_dec_pending: num_pen: 1
  ksdioirqd/mmc2-139   [002] ...1    74.720709: ath10k_htt_tx_dec_pending: num_pen: 0

ath10k just finished transmission of pending frames.

Half a second later, the send buffer is full, and we start seeing errors
in iperf.

           iperf-181   [001] ....    75.191606: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    75.701511: tcp_sendmsg_locked: err: -11
           iperf-181   [001] ....    76.211648: tcp_sendmsg_locked: err: -11



Sure, the regular way ath10k_mac_op_wake_tx_queue is called is via
ieee80211_subif_start_xmit => __ieee80211_subif_start_xmit => ieee80211_xmit_fast
=> ieee80211_queue_skb => drv_wake_tx_queue.

But I was expecting the call to ieee80211_wake_queue to somehow trigger a call
to ath10k_mac_op_wake_tx_queue, since there is still data in the send buffer/
in the ieee80211_txq that needs to be sent, to allow more data to be written to
the socket. But obviously the callback never comes.
Or how else is this supposed to work?


Note that setting ops->wake_tx_queue = NULL; works around the problem
(i.e. let mac80211 use ath10k's tx callback instead of wake_tx_queue callback).
But then there might still be a bug for other drivers to stumble upon.

However, since the only wireless drivers using wake_tx_queue are: ath9k, ath10k,
and mt76, perhaps it is not such a bad idea to use the tx callback instead of
the wake_tx_queue callback.


Kind regards,
Niklas



More information about the ath10k mailing list