ath10k_pci and Nvidia Jetson/Xavier

Wheaton, Ryan ryan.wheaton at cel.com
Thu Jan 27 06:52:10 PST 2022


Hello,

I've tried to subscribe to the ath10k mailing list, but never got a 
confirmation - I'm not sure if I need to be approved or not, but I'll 
try my request anyways.  I'm hoping to find some help to diagnose an 
issue with the ath10k driver/firmware as I've reached out to many 
support sources and not had any luck.  I've done a lot of reading around 
and haven't had much success.  We're a module developer, and a QC 
partner, and we have a QCA9377 chipset based on the "Black Bean" from 
8devices:

https://www.8devices.com/products/black-bean

We have a customer plugging the PCIe 9377 into a M.2 E interface along 
with an arm64 based Nvidia Xavier.

The connector on the board is a standard M.2 E-key (not mini pcie):
https://www.tti.com/content/ttiinc/en/apps/part-detail.html?mfrShortname=TYC&partsNumber=2199230-4&utm=top&channel=ppc&source=google&campaigns=tti-brand&gclid=Cj0KCQiA2ZCOBhDiARIsAMRfv9KXcvrg4CxIxp-u6ALN0de8TSOR5Sq43nI8uhEl22NPLiA3p14GkYEaAm_IEALw_wcB

They're using the mainline ath10k_pci module/driver with linux-4.9.140 
and got the firmware from:

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/ath10k/QCA9377

$ find /lib/firmware/ath10k/QCA9377/
/lib/firmware/ath10k/QCA9377/
/lib/firmware/ath10k/QCA9377/hw1.0
/lib/firmware/ath10k/QCA9377/hw1.0/notice_ath10k_firmware-6.txt
/lib/firmware/ath10k/QCA9377/hw1.0/firmware-sdio-5.bin
/lib/firmware/ath10k/QCA9377/hw1.0/notice_ath10k_firmware-sdio-5.txt
/lib/firmware/ath10k/QCA9377/hw1.0/firmware-5.bin
/lib/firmware/ath10k/QCA9377/hw1.0/notice_ath10k_firmware-5.txt
/lib/firmware/ath10k/QCA9377/hw1.0/board.bin
/lib/firmware/ath10k/QCA9377/hw1.0/board-2.bin
/lib/firmware/ath10k/QCA9377/hw1.0/firmware-6.bin

$ lsmod | grep ath10
ath10k_pci 59047 0
ath10k_core 343862 1 ath10k_pci
ath 27262 1 ath10k_core
mac80211 834112 1 ath10k_core
cfg80211 710072 3 mac80211,ath,ath10k_core

lspci and lsusb detect the QCA9377-P.

lspci:


$ lspci -s 0003:01:00.0 -vvv
0003:01:00.0 Network controller: Qualcomm Atheros QCA9377 802.11ac 
Wireless Network Adapter (rev 31)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 
Latency: 0
Interrupt: pin A routed to IRQ 765
Region 0: Memory at 12b0000000 (64-bit, non-prefetchable) [size=2M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/8 Maskable+ 64bit-
Address: fffff000 Data: 0000
Masking: fffffffe Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s 
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- 
BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via 
message
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF 
Disabled
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- 
ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, 
EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- 
ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- 
ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ 
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP+ Rollover+ Timeout+ NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
Capabilities: [148 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [168 v1] Device Serial Number 00-00-00-00-00-00-00-00
Capabilities: [178 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [180 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ 
L1_PM_Substates+
PortCommonModeRestoreTime=50us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Kernel driver in use: ath10k_pci
Kernel modules: ath10k_pci

lsusb:

Bus 001 Device 002: ID 0cf3:e500 Atheros Communications, Inc.

when trying to load the driver, they're seeing errors:

[ 6.846882] ath10k_pci 0003:01:00.0: enabling device (0000 -> 0002)
[ 6.847513] ath10k_pci 0003:01:00.0: pci irq msi oper_irq_mode 2 
irq_mode 0 reset_mode 0
[ 6.847914] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 6.847941] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 6.848215] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 6.848352] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 6.887751] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 6.887779] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 6.887999] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 6.888155] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 7.014505] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 7.014559] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 7.014773] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 7.014910] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 7.015037] pcieport 0003:00:00.0: AER: Corrected error received: 
id=0000
[ 7.015063] pcieport 0003:00:00.0: can't find device of ID0000
[ 7.054575] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 7.054632] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 7.054835] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 7.054971] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 7.055088] pcieport 0003:00:00.0: AER: Corrected error received: 
id=0000
[ 7.055105] pcieport 0003:00:00.0: can't find device of ID0000
[ 7.188092] mc-err: (255) csr_pcie3r: EMEM address decode error
[ 7.188228] mc-err: status = 0x200050de; addr = 0x519bdd40; 
hi_adr_reg=008
[ 7.188344] mc-err: secure: no, access-type: read
[ 7.188441] mc-err: mcerr: unknown intr source intstatus = 0x00000000, 
intstatus_1 = 0x00000000
[ 9.191230] ath10k_pci 0003:01:00.0: unable to get target info from 
device
[ 9.191395] ath10k_pci 0003:01:00.0: could not get target info (-110)
[ 9.191508] ath10k_pci 0003:01:00.0: could not probe fw (-110)

with iommu enabled on PCIe bus. (nvidia can enable and disable iommu per 
PCIe bus in device-tree), they see slightly different behavior:

[ 6.935042] ath10k_pci 0003:01:00.0: enabling device (0000 -> 0002)
[ 6.935646] ath10k_pci 0003:01:00.0: pci irq msi oper_irq_mode 2 
irq_mode 0 reset_mode 0
[ 6.936128] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 6.936267] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 6.939628] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 6.939825] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 6.972170] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 6.972324] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Transmitter ID)
[ 6.972531] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00001001/0000e000
[ 6.972690] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 6.972863] pcieport 0003:00:00.0: [12] Replay Timer Timeout
[ 6.972979] pcieport 0003:00:00.0: AER: Corrected error received: 
id=0000
[ 6.972997] pcieport 0003:00:00.0: can't find device of ID0000
[ 7.098806] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 7.098861] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 7.099074] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 7.099328] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 7.099475] pcieport 0003:00:00.0: AER: Corrected error received: 
id=0000
[ 7.099495] pcieport 0003:00:00.0: can't find device of ID0000
[ 7.136562] pcieport 0003:00:00.0: AER: Multiple Corrected error 
received: id=0000
[ 7.136591] pcieport 0003:00:00.0: PCIe Bus Error: severity=Corrected, 
type=Physical Layer, id=0000(Receiver ID)
[ 7.136871] pcieport 0003:00:00.0: device [10de:1ad2] error 
status/mask=00000001/0000e000
[ 7.137044] pcieport 0003:00:00.0: [ 0] Receiver Error (First)
[ 7.269652] ath10k_pci 0003:01:00.0: Direct firmware load for 
ath10k/pre-cal-pci-0003:01:00.0.bin failed with error -2
[ 7.269886] ath10k_pci 0003:01:00.0: Falling back to user helper
[ 7.270861] ath10k_pci 0003:01:00.0: Direct firmware load for 
ath10k/cal-pci-0003:01:00.0.bin failed with error -2
[ 7.271049] ath10k_pci 0003:01:00.0: Falling back to user helper
[ 7.275733] ath10k_pci 0003:01:00.0: qca9377 hw1.1 target 0x05020001 
chip_id 0x003821ff sub 0000:0000
[ 7.275740] ath10k_pci 0003:01:00.0: kconfig debug 0 debugfs 1 tracing 0 
dfs 0 testmode 0
[ 7.276884] ath10k_pci 0003:01:00.0: firmware ver 
WLAN.TF.1.0-00002-QCATFSWPZ-5 api 5 features ignore-otp crc32 c3e0d04f
[ 7.341669] ath10k_pci 0003:01:00.0: failed to fetch board data for 
bus=pci,vendor=168c,device=0042,subsystem-vendor=0000,subsystem-device=0000 
from ath10k/QCA9377/hw1.0/board-2.bin
[ 7.342629] ath10k_pci 0003:01:00.0: board_file api 1 bmi_id N/A crc32 
544289f7
[ 10.668806] ath10k_pci 0003:01:00.0: failed to receive control response 
completion, polling..
[ 11.692915] ath10k_pci 0003:01:00.0: ctl_resp never came in (-110)
[ 11.693166] ath10k_pci 0003:01:00.0: failed to connect to HTC: -110
[ 11.709420] ath10k_pci 0003:01:00.0: could not init core (-110)
[ 11.709799] ath10k_pci 0003:01:00.0: could not probe fw (-110)

After enabling iommu, it still fails, but gives more info.

The M.2 E (PCIe and USB) of the board has been verified working by 
placing a different wifi module which uses M.2 E as well (PCIe and USB). 
And that one works.

The problem has also been posted to the Nvidia boards to try and find 
more information towards a solution:

https://forums.developer.nvidia.com/t/qca9377-doesnt-work-on-xavier/198732

I've also attached more verbose dmesg debug logs, in case that helps.  
It seems that the device is detected but a connection is not 
established.

Any ideas on what might be going on or more debug steps to try?

Thanks very much in advance for any advice or direction you might be 
able to give.

Best,

-ryan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ath10k_verbose_log
Type: application/octet-stream
Size: 40956 bytes
Desc: ath10k_verbose_log
URL: <http://lists.infradead.org/pipermail/ath10k/attachments/20220127/58c1b627/attachment-0001.obj>


More information about the ath10k mailing list