[PATCH v2] ath10k: fix wmi mgmt tx queue full due to race condition

Kalle Valo kvalo at codeaurora.org
Thu Jan 28 02:19:57 EST 2021

Miaoqing Pan <miaoqing at codeaurora.org> wrote:

> Failed to transmit wmi management frames:
> [84977.840894] ath10k_snoc a000000.wifi: wmi mgmt tx queue is full
> [84977.840913] ath10k_snoc a000000.wifi: failed to transmit packet, dropping: -28
> [84977.840924] ath10k_snoc a000000.wifi: failed to submit frame: -28
> [84977.840932] ath10k_snoc a000000.wifi: failed to transmit frame: -28
> This issue is caused by race condition between skb_dequeue and
> __skb_queue_tail. The queue of ‘wmi_mgmt_tx_queue’ is protected by a
> different lock: ar->data_lock vs list->lock, the result is no protection.
> So when ath10k_mgmt_over_wmi_tx_work() and ath10k_mac_tx_wmi_mgmt()
> running concurrently on different CPUs, there appear to be a rare corner
> cases when the queue length is 1,
>   CPUx (skb_deuque)			CPUy (__skb_queue_tail)
> 					next=list
> 					prev=list
>   struct sk_buff *skb = skb_peek(list);	WRITE_ONCE(newsk->next, next);
>   WRITE_ONCE(list->qlen, list->qlen - 1);WRITE_ONCE(newsk->prev, prev);
>   next       = skb->next;		WRITE_ONCE(next->prev, newsk);
>   prev       = skb->prev;		WRITE_ONCE(prev->next, newsk);
>   skb->next  = skb->prev = NULL;	list->qlen++;
>   WRITE_ONCE(next->prev, prev);
>   WRITE_ONCE(prev->next, next);
> If the instruction ‘next = skb->next’ is executed before
> ‘WRITE_ONCE(prev->next, newsk)’, newsk will be lost, as CPUx get the
> old ‘next’ pointer, but the length is still added by one. The final
> result is the length of the queue will reach the maximum value but
> the queue is empty.
> So remove ar->data_lock, and use 'skb_queue_tail' instead of
> '__skb_queue_tail' to prevent the potential race condition. Also switch
> to use skb_queue_len_lockless, in case we queue a few SKBs simultaneously.
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1.c2-00033-QCAHLSWMTPLZ-1
> Signed-off-by: Miaoqing Pan <miaoqing at codeaurora.org>
> Reviewed-by: Brian Norris <briannorris at chromium.org>
> Signed-off-by: Kalle Valo <kvalo at codeaurora.org>

Patch applied to ath-next branch of ath.git, thanks.

b55379e343a3 ath10k: fix wmi mgmt tx queue full due to race condition



