Bug in Memory Layout of rx_desc for QCA6174

Francesco Magliocca franciman12 at gmail.com
Fri Jun 18 00:28:51 PDT 2021


Hello everyone,
I have a QCA6174 PCIe board, I am using linux kernel 5.12.10.
The firmware loaded is:
> [    4.483131] ath10k_pci 0000:02:00.0: qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1a56:143a
> [    4.483136] ath10k_pci 0000:02:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 0 testmode 0
> [    4.483567] ath10k_pci 0000:02:00.0: firmware ver WLAN.RM.4.4.1-00157-QCARMSWPZ-1 api 6 features wowlan,ignore-otp,mfp crc32 90eebefb
> [    4.572730] ath10k_pci 0000:02:00.0: board_file api 2 bmi_id N/A crc32 318825bf
> [    4.665592] ath10k_pci 0000:02:00.0: htt-ver 3.60 wmi-op 4 htt-op 3 cal otp max-sta 32 raw 0 hwcrypto 1

around six months ago I reported a bug which is still haunting me:
When I am connected to my home's Wi-Fi network and my father's Huawei
smartphone is connected too
my Wi-Fi card hangs and gets stuck, I have to force restart of the device.

Note that this problem does not happen if my pc and the smartphone are
connected to different networks (for example
I tried connecting my pc to the 2.4GHz network and the smartphone to
the 5GHz network, and the bug does not appear).

Now, I tried bisecting driver changes, and I found the faulty one,
it is the commit: e3def6f7ddf88636febb12e1e3e86387a4ce5452

It adds some fields to structures like rx_msdu_start, rx_frag_info, etc..
The changes modify the size of these structures!

If I revert this commit changes, the bug does not happen
(I tested it for two weeks, while the bug happens at least once in 2-3 hours
from when the smartphone is connected to the wifi network).

Also, if I selectively remove some of the changes introduced by the
faulty commit,
the bug does not go away, so it looks like the problem is in the
change of size of the
data structures.

Now, I'd like to ask you what we can do to fix this problem...
Is there something I am doing wrong?
Or is there a bug in the firmware?

If the firmware can't be easily fixed, I was thinking that we can
abstract the htt_rx_desc
(in the same way we do with ops in other parts of the driver) to have
two versions:
one for 32-bit descriptors (like my QCA6174)
and one for 64-bit descriptors (i.e. WCN3990, which was the cause of
this change).

I'd be really happy to help, but I am not sure I fully understand what
is going on,
so what do you think is happening and what should we do?

Thanks in advance.
Greetings,
Francesco Magliocca



More information about the ath10k mailing list