BUG and WARNINGs from mt7921s on next-20240916

Alper Nebi Yasak alpernebiyasak at gmail.com
Tue Sep 17 03:08:08 PDT 2024


Hi,

On 2024-09-17 12:15 +03:00, Felix Fietkau wrote:
> On 17.09.24 08:17, Kalle Valo wrote:
>> Lorenzo Bianconi <lorenzo at kernel.org> writes:
>>
>>>> Hi,
>>>>
>>>> I ran into some bug messages while testing linux-next on a MT8186
>>>> Magneton Chromebook (mt8186-corsola-magneton-sku393218). It boots 
>>>> to the OS, but at least Wi-Fi and Bluetooth are unavailable.
>>>>
>>>> As a start, I tried reverting commit abbd838c579e ("Merge tag 
>>>> 'mt76-for-kvalo-2024-09-06' of https://github.com/nbd168/wireless")
>>>> and it works fine after that. Didn't have time to do a full bisect, 
>>>> but will try if nobody has any immediate opinions.
>>>>
>>>> There are a few traces, here's some select lines to catch your attention,
>>>> not sure how informational they are:
>>>>
>>>> [   16.040525] kernel BUG at net/core/skbuff.c:2268!
>>>> [   16.040531] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>>>> [ 16.040803] CPU: 3 UID: 0 PID: 526 Comm: mt76-sdio-txrx Not tainted
>>>> 6.11.0-next-20240916-deb-00002-g7b544e01c649 #1
>>>> [   16.040897] Call trace:
>>>> [   16.040899]  pskb_expand_head+0x2b0/0x3c0
>>>> [   16.040905]  mt76s_tx_run_queue+0x274/0x410 [mt76_sdio]
>>>> [   16.040909]  mt76s_txrx_worker+0xe4/0xac8 [mt76_sdio]
>>>> [   16.040914]  mt7921s_txrx_worker+0x98/0x1e0 [mt7921s]
>>>> [   16.040924]  __mt76_worker_fn+0x80/0x128 [mt76]
>>>> [   16.040934]  kthread+0xe8/0xf8
>>>> [   16.040940]  ret_from_fork+0x10/0x20
>>>
>>> Hi,
>>>
>>> I guess this issue has been introduced by the following commit:
>>>
>>> commit 3688c18b65aeb2a1f2fde108400afbab129a8cc1
>>> Author: Felix Fietkau <nbd at nbd.name>
>>> Date:   Tue Aug 27 11:30:01 2024 +0200                  
>>>
>>>     wifi: mt76: mt7915: retry mcu messages                                            
>>>                         
>>>     In some cases MCU messages can get lost. Instead of failing completely,
>>>     attempt to recover by re-sending them.
>>>      
>>>     Link: https://patch.msgid.link/20240827093011.18621-14-nbd@nbd.name
>>>     Signed-off-by: Felix Fietkau <nbd at nbd.name>
>>>
>>>
>>> In particular, skb_get() in mt76_mcu_skb_send_and_get_msg() is bumping skb users
>>> refcount (making the skb shared) and pskb_expand_head() (run by __skb_grow() in
>>> mt76s_tx_run_queue()) does not like shared skbs.
>>>
>>> @Felix: any input on it?
> 
> Sorry about that. Please try this patch, it should probably resolve this issue:
> 
> ---
> --- a/drivers/net/wireless/mediatek/mt76/mcu.c
> +++ b/drivers/net/wireless/mediatek/mt76/mcu.c
> @@ -84,13 +84,15 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb,
>   	mutex_lock(&dev->mcu.mutex);
>   
>   	if (dev->mcu_ops->mcu_skb_prepare_msg) {
> +		orig_skb = skb;
>   		ret = dev->mcu_ops->mcu_skb_prepare_msg(dev, skb, cmd, &seq);
>   		if (ret < 0)
>   			goto out;
>   	}
>   
>   retry:
> -	orig_skb = skb_get(skb);
> +	if (orig_skb)
> +		skb_get(orig_skb);
>   	ret = dev->mcu_ops->mcu_skb_send_msg(dev, skb, cmd, &seq);
>   	if (ret < 0)
>   		goto out;
> @@ -105,7 +107,7 @@ int mt76_mcu_skb_send_and_get_msg(struct mt76_dev *dev, struct sk_buff *skb,
>   	do {
>   		skb = mt76_mcu_get_response(dev, expires);
>   		if (!skb && !test_bit(MT76_MCU_RESET, &dev->phy.state) &&
> -		    retry++ < dev->mcu_ops->max_retry) {
> +		    orig_skb && retry++ < dev->mcu_ops->max_retry) {
>   			dev_err(dev->dev, "Retry message %08x (seq %d)\n",
>   				cmd, seq);
>   			skb = orig_skb;
> 

Tested-by: Alper Nebi Yasak <alpernebiyasak at gmail.com>

Thanks!



More information about the Linux-mediatek mailing list