ath12k WCN7850: Q6 Hexagon fault at WLAON region 0x1792000 ~2s post-AUTHORIZE on X1E80100
Baochen Qiang
baochen.qiang at oss.qualcomm.com
Wed May 13 18:55:30 PDT 2026
On 5/14/2026 4:47 AM, Marcus Glocker wrote:
> On Wed, May 13, 2026 at 01:26:50PM +0200, Marcus Glocker wrote:
>
>> On Wed, May 13, 2026 at 11:05:05AM +0800, Baochen Qiang wrote:
>>
>>>
>>>
>>> On 5/13/2026 3:59 AM, Marcus Glocker wrote:
>>>> On Tue, May 12, 2026 at 11:38:06AM +0800, Baochen Qiang wrote:
>>>>
>>>>>
>>>>>
>>>>> On 5/5/2026 5:08 AM, Marcus Glocker wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We're porting ath12k to OpenBSD as the qwz(4) driver, targeting Samsung
>>>>>> Galaxy Book4 Edge (X1E80100 SoC, WCN7850 hw2.0). Scan, auth, 4-way
>>>>>> handshake all complete; ~2 seconds after WPA2 AUTHORIZE the WCN7850
>>>>>> firmware crashes deterministically with:
>>>>>>
>>>>>> dlpager_main.c:147 Non Page Fault Exception cause code 0x 23
>>>>>> at Address: 0x 1792000
>>>>>>
>>>>>> Cause code 0x23 isn't a valid arm64 exception -- the fault is on the
>>>>>> WCN7850's on-die Hexagon Q6 DSP, with QURT's generic exception handler
>>>>>> (which happens to live in dlpager_main.c) printing it. So this is not
>>>>>> a host CPU fault.
>>>>>>
>>>>>> Per the RDDM segment table (at the start of the dump), VA 0x01792000
>>>>>> is the start of the chip's WLAON_DUMP region (size 0x820). The Q6 is
>>>>>> trying to read its own always-on hardware state region and the chip
>>>>>> refuses the access.
>>>>>>
>>>>>> (Samsung, Asus, Honor) with multiple FW builds. Currently testing
>>>>>> with WLAN.HMT.1.1.c5-00302-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.115823.3
>>>>>> (fw 0x110cffff, 2025-06-25) -- the exact blob a Linux ath12k user
>>>>>> runs successfully on the identical Samsung hardware. Same board-2.bin,
>>>>>> same compiled DTB (upstream hamoa.dtsi based).
>>>>>>
>>>>>> We've field-compared qwz against ath12k and ruled out (byte-level or
>>>>>> wire-level):
>>>>>>
>>>>>> * QMI host_cap, m3_info, wlan_cfg, wlan_ini, bdf_download (all
>>>>>> fields including ce_config, svc_to_ce_map, shadow_reg_v3,
>>>>>> feature_list, m3 paddr/size, nm_modem)
>>>>>> * MHI bringup ordering (BHI -> wait SBL EE -> wait M0 -> BHIE)
>>>>>> * BHI/BHIE DMA coherency
>>>>>> * ASPM disable before MHI start
>>>>>> * WLAON_WARM_SW_ENTRY zeroing + QFPROM_PWR_CTRL VDD4BLOW clear
>>>>>> * static_window_map=false + window-bank register init
>>>>>> * Per-chunk vs monolithic respond_mem allocation
>>>>>> * WMI_PEER_MIMO_PS_STATE = WMI_PEER_SMPS_PS_NONE (added matching
>>>>>> ath12k_setup_peer_smps; doesn't help)
>>>>>> * FW image variation (c5 and c7 both fail identically)
>>>>>>
>>>>>> Specifically NOT involved (we have evidence either way):
>>>>>>
>>>>>> * Gunyah -- X1E80100 is reportedly run in EL2 without Gunyah by
>>>>>> users where ath12k works; so Gunyah isn't programming WLAON
>>>>>> access for the Q6.
>>>>>> * SMMU / pcie_smmu -- pcie_smmu is status="reserved" upstream,
>>>>>> pcie4 has no iommus property; PCIe DMA bypasses SMMU.
>>>>>> * SCM/PAS -- ath12k's PCIe path makes no qcom_scm_* calls.
>>>>>>
>>>>>> Question: what subsystem inside the WCN7850 firmware touches the
>>>>>> WLAON region at 0x01792000 around 2 seconds after the host sends
>>>>>> WMI_PEER_AUTHORIZE? And what host-side configuration (WMI command,
>>>>>> HTT message, MHI state, etc.) primes that path so the access
>>>>>> succeeds on Linux?
>>>>>>
>>>>>> Even a pointer at the right Linux code path or the right FW-side
>>>>>> component would unblock us. We have full RDDM dumps and dmesg
>>>>>> captures available; happy to share off-list or as attachments.
>>>>>
>>>>> please help collect ath12k successful dmesg log and qwz failed dmesg log for compare.
>>>>>
>>>>> Please enable verbose ath12k log when loading ath12k driver:
>>>>>
>>>>> If you are using the latest upstream ath12k:
>>>>>
>>>>> sudo modprobe ath12k debug_mask=0xffffffff
>>>>> sudo modprobe ath12k_wifi7
>>>>>
>>>>> If you are using an old ath12k:
>>>>>
>>>>> sudo modprobe ath12k debug_mask=0xffffffff
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Marcus
>>>>>>
>>>>>
>>>>
>>>> Hi Baochen,
>>>>
>>>> Thanks for coming back on this topic.
>>>>
>>>> Attached the OpenBSD dmesg, with full ath12k driver debug logging
>>>
>>> the dmesg shows several WMI_INIT cmd instances which is not expected, because in normal
>>> operation this command should be sent only once.
>>>
>>> cat dmesg |grep -w 'sending WMI command 0x1'
>>> May 12 19:35:46 x1e /bsd: qwz_wmi_cmd_send_nowait: sending WMI command 0x1
>>> May 12 19:37:20 x1e /bsd: qwz_wmi_cmd_send_nowait: sending WMI command 0x1
>>> May 12 19:37:41 x1e /bsd: qwz_wmi_cmd_send_nowait: sending WMI command 0x1
>>> May 12 19:37:46 x1e /bsd: qwz_wmi_cmd_send_nowait: sending WMI command 0x1
>>> May 12 19:37:50 x1e /bsd: qwz_wmi_cmd_send_nowait: sending WMI command 0x1
>>>
>>> other than that I don't find any other clues.
>>
>> Yes, that is specific to the OpenBSD NIC framework. I've just tested
>> a quick hack with which the WMI_INIT cmd only gets issued once, but it
>> makes no difference to the firmware crash.
>>
>>>> enabled, plus the resulting RDDM binary after the firmware crash:
>>>
>>> how did you collect the RDDM binary, seems not in the right format, my tool can not parse
>>> it correctly. Looking into the binary, at least the magic 'ATH12K-FW-DUMP' is not present
>>> at the very beginning.
>>
>> It looks like ath12k wraps the raw RDDM dump in some ath12k firmware
>> dump structure, which we don't do with our driver. I did write a small
>> conversion program, trying to generate the dump which you expect. You
>> can find the converted dump file here:
>>
>> https://nazgul.ch/pub/qwz0-rddm.bin.out.gz
>>
>> I hope you can load that in to your tool.
>>
>>> And from which Linux version you take the ath12k codebase?
>>
>> Well, that is a good question. qwz (the ath12k OpenBSD driver), is
>> an initial clone of the qwx (the ath11k OpenBSD driver), which is
>> functional. On top of that we did changes, of which the recent ones
>> did sync missing functionality from the Linux ath12k driver. We did
>> already do a lot of comparison between qwz and the ath12k driver, but
>> we can't spot an obvious difference which could explain the firmware
>> crash. Obviously doesn't mean that there isn't a gap between qwz and
>> ath12k related to this issue which we don't see.
>>
>>>>
>>>> https://nazgul.ch/pub/qwz0-rddm.bin.gz
>>>>
>>>> The command sequence on OpenBSD to re-produce that was:
>>>>
>>>> ifconfig qwz0 up # Bring the ath12k device up
>>>> ifconfig qwz0 scan # Scan for networks
>>>> ifconfig qwz0 nwid nazgul wpakey xxx # Start association
>>>>
>>>> Hi Max,
>>>>
>>>> Since you have Linux running on exactly the same Samsung Galaxy Book4
>>>> Edge 14" laptop, where ath12k works, would you be so kind and also
>>>> provide the dmesg output showing an successful association with the
>>>> ath12k driver debug logging enabled? See above how to enable that.
>>>> That would be very helpful!
>>>>
>>>> Thanks and Regards,
>>>> Marcus
>>>
>
> Hi Baochen,
>
> I just want to quickly let you know that we did overcome the firmware
> crash. The culprit was that we did
>
> #define RX_BE_PADDING0_BYTES 80 -> instead of 8
>
> which did break the hal_rx_desc_wcn7850 struct:
>
> struct hal_rx_desc_wcn7850 {
> u64 msdu_end_tag; // offset 0
> struct rx_msdu_end_qcn9274 msdu_end; // offset 8
> u8 rx_padding0[N]; // <- the bug
> u64 mpdu_start_tag;
> struct rx_mpdu_start_qcn9274 mpdu_start;
> struct rx_pkt_hdr_tlv pkt_hdr_tlv;
> u8 msdu_payload[];
> };
>
> With that fixed, the firmware error is gone, and we can now receive
> and IP from DHCP. We're working on getting the TX path work next.
OK, good to see it gets fixed!
>
> Thanks and Regards,
> Marcus
More information about the ath12k
mailing list