[BUG] mt7921e: Intermittent connection failure
silverducks
silverducks at altlinux.org
Thu Apr 9 05:18:27 PDT 2026
On 02/04/2026 01:58, Sean Wang wrote:
> I think the current test setup is still mixing too many variables, so
> it is hard to tell what is actually triggering the issue.
>
> In particular, if the goal is to test the NetworkManager path, the
> script should not also manually manage wpa_supplicant, and iwd should
> not be part of the same test either. NetworkManager normally manages
> the Wi-Fi backend itself, so mixing manual wpa_supplicant handling,
> iwd, and NetworkManager in one setup makes the result difficult to
> interpret.
> Could you first simplify the setup and test one path at a time?
> If you want to test NetworkManager, use only NetworkManager, for
> example by using nmcli to explicitly control the connection steps.
> If you want to test plain wpa_supplicant, stop NetworkManager
> completely and use only wpa_supplicant + wpa_cli. I would suggest
> starting with this path, since that is also the setup I usually use
> for testing.
> If you want to test iwd, please test it separately as well.
Some confusion here: I have indeed tested them separately. I left in
wpa_supplicant start in that particular script by mistake. Extra stop
commands
for iwd and supplicant at the end of the loop are also redundant. I
don't think
either affects outcome of the tests, but just in case, I'll clean up the
test
scripts and rerun.
I am also attaching all scripts used alongside the resulting logs for
clarity.
> Also suggest to avoid suspend/resume or hibernation for now.
> The log you shared includes a clear S4 resume path (ACPI: PM: Waking
> up from system sleep state S4 and pci_pm_restore returns -110), which
> does not match a simple reconnect or module reload test.
Those are the logs I collected when originally diagnosing the issue. Scripts
reload kernel module instead because I since realised that this is enough to
trigger the bug and it's much faster to do it this way when searching for a
failstate.
I am attaching logs where module reload is used for all cases.
------------
New data
------------
After some additional testing, I noticed simply reconnecting enough
times seems
to trigger the bug as well. I'll attach an additional script for this too.
Given this, I've looked through the code relevant to connection process. I
think I may have found something of interest. The following patch gets
rid of
original lockup state, but introduces a new error into dmesg.
---
drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
index 462b4a68c4f0..47621ff13839 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76_connac_mcu.c
@@ -1788,7 +1788,7 @@ int mt76_connac_mcu_cancel_hw_scan(struct mt76_phy
*phy,
}
return mt76_mcu_send_msg(phy->dev, MCU_CE_CMD(CANCEL_HW_SCAN),
- &req, sizeof(req), false);
+ &req, sizeof(req), true);
}
EXPORT_SYMBOL_GPL(mt76_connac_mcu_cancel_hw_scan);
--
2.50.1
Seems like making this call wait for the response is fixing the issue at
least
partially. Judging from the relevant logs, it effectively makes the driver
reset on it's own whenever there's a problem.
Logs of two runs with a kernel compiled with this patch are attached.
One for reconnecting variant, one for module reload variant.
------------
Attachments
------------
I've structured the archive with a subfolder for each testing run, each
containing the relevant logs and the script, if applicable. Overview:
iwd and wpa_supplicant -only setups work as expected.
NetworkManager with module reload ends up in failstate.
NetworkManager with reconnection ends up in failstate.
Both of the latter start to have additional driver reloads instead of
locking
when run with patched kernel.
Finally, I am also attaching a case where problem occurred immediately
after boot.
All failed cases were ended after confirming the failstate.
All successful cases were interrupted manually after a few minutes.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: attachments.tar.gz
Type: application/gzip
Size: 712828 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-mediatek/attachments/20260409/ac31127f/attachment-0001.gz>
More information about the Linux-mediatek
mailing list