[PATCH] ath11k: pci: avoid unsafe register access during shutdown
Baochen Qiang
baochen.qiang at oss.qualcomm.com
Mon Aug 11 19:20:51 PDT 2025
On 8/11/2025 8:26 PM, Balsam Chihi wrote:
> On Mon, Aug 11, 2025 at 1:00 PM Baochen Qiang
> <baochen.qiang at oss.qualcomm.com> wrote:
>>
>>
>>
>> On 8/11/2025 5:08 PM, Balsam Chihi wrote:
>>> On Fri, Jul 25, 2025 at 5:25 AM Baochen Qiang
>>> <baochen.qiang at oss.qualcomm.com> wrote:
>>>>
>>>>
>>>>
>>>> On 6/12/2025 12:40 AM, balsam.chihi at moment.tech wrote:
>>>>> From: Balsam CHIHI <balsam.chihi at moment.tech>
>>>>>
>>>
>>> Hello, sorry for the delay. I was OoO.
>>>
>>>>> During system reboot or module removal (rmmod), a crash was observed
>>>>> due to a synchronous external abort caused by invalid register access
>>>>> in ath11k_pci_clear_dbg_registers(). This happens when the device is
>>>>> already powered off and the driver attempts to read debug registers.
>>>>
>>>> What does 'powered off' mean exactly here? is WLAN powered by an internal/external supply
>>>> that will be removed during rmmod/reboot?
>>>
>>> The "powered off" means that the wlan card is rmmod or reboot.
>>> It is not powered by external supply.
>>
>> OK.
>>
>>>
>>>>
>>>>>
>>>>> The crash log shows:
>>>>> root at OpenWrt:~# reboot
>>>>> [ 343.663492] Internal error: synchronous external abort:
>>>>> 0000000096000210 [#1] SMP
>>>>> [ 343.670992] Modules linked in: [...]
>>>>> [ 343.842432] CPU: 7 PID: 9435 Comm: procd Tainted: G O 6.6.86 #0
>>>>> [ 343.849746] Hardware name: LS1088A
>>>>> [ 343.856969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>> [ 343.863933] pc : ath11k_pci_get_msi_irq+0x18a0/0x1900 [ath11k_pci]
>>>>
>>>> I don't see any relationship between ath11k_pci_clear_dbg_registers() and this PC pointer.
>>>> So how did you locate the problematic register access then?
>>>
>>> I tyied to look for the problem based the PC pointer but I was not
>>> abled to find it.
>>> So, I added prints in ath11k driver to find out where it stucks,
>>> classical debug method.
>>
>> Yeah, classical and always work.
>>
>>> Then I found out that any "ath11k_pcic_read32()" call fails on rmmod
>>> ath11k_pci or reboot,
>>
>> I worry PCIe link is not stable on your platform when rmmod/reboot. Can you try with ASPM
>> disabled?
>
> ASPM is indeed disabled in the Kernel config.
> I will also try with your patch and get back to you with the results.
> And in the meanwhile, I would like to let you know the following information :
> Other PCIe devices are working well even with ASPM enabled in the same slot,
> and the issue is present in all PCIe slots.
Thanks, the info helps.
> I have many board variants of the same platform that showed the same output.
> Maybe it is only related to the Qualcomm QCN9074 WiFi6E chipset?
I think so.
> I tried 2 wifi cards from two manufacturers of the same wifi chip and
> the issue is the same.
Do you have a WCN6855 WiFi chip to try?
> Is it related to the SoC Freescale Layerscape LS1088A?
I never heard of such issues before. For now I am suspecting that your SoC PCIe RC behaves
differently from other platforms, like X86.
> It seems like the kernel is accessing a freed resource or no longer
> existing memory address because the wifi card is turned off.
WiFi chip might enter a state where access to some specific registers could not be
completed, hence the 'abort' error.
>
>
>>
>> --- a/drivers/net/wireless/ath/ath11k/pci.c
>> +++ b/drivers/net/wireless/ath/ath11k/pci.c
>> @@ -605,11 +605,13 @@ static void ath11k_pci_aspm_disable(struct ath11k_pci *ab_pci)
>>
>> static void ath11k_pci_aspm_restore(struct ath11k_pci *ab_pci)
>> {
>> +#if 0
>> if (test_and_clear_bit(ATH11K_PCI_ASPM_RESTORE, &ab_pci->flags))
>> pcie_capability_clear_and_set_word(ab_pci->pdev, PCI_EXP_LNKCTL,
>> PCI_EXP_LNKCTL_ASPMC,
>> ab_pci->link_ctl &
>> PCI_EXP_LNKCTL_ASPMC);
>> +#endif
>> }
>>
>> #ifdef CONFIG_DEV_COREDUMP
>>
Anyway, please try with above change. In addition you can manually confirm ASPM status in
LinkCtrl register with lspci tool, on my machine it is like:
$ sudo lspci -s 02:00.0 -vv
02:00.0 Network controller: Qualcomm Device 1107 (rev 01)
[...]
Capabilities: [70] Express (v2) Endpoint, MSI 00
[...]
LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
[...]
>>
>>> but they work fine when insmod/boot ath11k_pci.
>>>
>>>>
>>>>> [ 343.870125] lr : ath11k_pcic_init_msi_config+0x98/0xc4 [ath11k]
>>>>> [...]
>>>>> [ 343.950800] Call trace:
>>>>> [ 343.953241] ath11k_pci_get_msi_irq+0x18a0/0x1900 [ath11k_pci]
>>>>> [ 343.959080] ath11k_pcic_init_msi_config+0x98/0xc4 [ath11k]
>>>>> [ 343.964671] ath11k_pcic_read32+0x30/0xb4 [ath11k]
>>>>> [ 343.969477] ath11k_pci_get_msi_irq+0x528/0x1900 [ath11k_pci]
>>>>> [ 343.975230] ath11k_pci_get_msi_irq+0x1460/0x1900 [ath11k_pci]
>>>>> [ 343.981068] ath11k_pci_get_msi_irq+0x1750/0x1900 [ath11k_pci]
>>>>> [ 343.986906] pci_device_shutdown+0x34/0x44
>>>>> [ 343.991004] device_shutdown+0x160/0x268
>>>>> [ 343.994928] kernel_restart+0x40/0xc0
>>>>> [ 343.998594] __do_sys_reboot+0x104/0x23c
>>>>> [ 344.002518] __arm64_sys_reboot+0x24/0x30
>>>>> [ 344.006529] do_el0_svc+0x6c/0x100
>>>>> [ 344.009931] el0_svc+0x28/0x9c
>>>>> [ 344.012986] el0t_64_sync_handler+0x120/0x12c
>>>>> [ 344.017344] el0t_64_sync+0x178/0x17c
>>>>> [ 344.021009] Code: f94e0a80 92404a73 91420273 8b130013 (b9400273)
>>>>> [ 344.027102] ---[ end trace 0000000000000000 ]---
>>>>>
>>>>> This patch adds a `power_on` flag to conditionally skip debug register
>>>>> reads when the device is not powered, preventing invalid memory access.
>>>>
>>>> why is only 'read' concerned here?
>>>
>>> Good question, I can't explain why only the reading fails but not the writing.
>>>
>>>>
>>>> You are skipping several 'read' in this change, could you help narrow down which
>>>> individual read instance causes this?
>>>>
>>>
>>> All read calls fail on rmmod/reboot, when
>>> ath11k_pci_clear_dbg_registers() is called.
>>> But they all succeed when insmod/boot ath11k_pci.
>>>
>>>>>
>>>>> With this change, the system no longer crashes on reboot or rmmod,
>>>>> and the driver continues to function correctly when reloaded.
>>>>>
>>>>> Tested-on: QCN9074 hw2.0 PCIe on LS1088A
>>>>> Tested-on: OpenWrt 24.10.1
>>>>> Fixes: crash on reboot/rmmod due to invalid register access
>>>>> Signed-off-by: Balsam CHIHI <balsam.chihi at moment.tech>
>>>>> ---
>>>>> drivers/net/wireless/ath/ath11k/pci.c | 36 ++++++++++++++++-----------
>>>>> 1 file changed, 21 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
>>>>> index 78444f8ea153..0c1b55957557 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/pci.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/pci.c
>>>>> @@ -203,16 +203,18 @@ static void ath11k_pci_soc_global_reset(struct ath11k_base *ab)
>>>>> ath11k_warn(ab, "link down error during global reset\n");
>>>>> }
>>>>>
>>>>> -static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab)
>>>>> +static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab, bool power_on)
>>>>> {
>>>>> - u32 val;
>>>>> + if (power_on) {
>>>>> + u32 val;
>>>>>
>>>>> - /* read cookie */
>>>>> - val = ath11k_pcic_read32(ab, PCIE_Q6_COOKIE_ADDR);
>>>>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "pcie_q6_cookie_addr 0x%x\n", val);
>>>>> + /* read cookie */
>>>>> + val = ath11k_pcic_read32(ab, PCIE_Q6_COOKIE_ADDR);
>>>>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "pcie_q6_cookie_addr 0x%x\n", val);
>>>>>
>>>>> - val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY);
>>>>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val);
>>>>> + val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY);
>>>>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val);
>>>>> + }
>>>>>
>>>>> /* TODO: exact time to sleep is uncertain */
>>>>> mdelay(10);
>>>>> @@ -223,14 +225,18 @@ static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab)
>>>>> ath11k_pcic_write32(ab, WLAON_WARM_SW_ENTRY, 0);
>>>>> mdelay(10);
>>>>>
>>>>> - val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY);
>>>>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val);
>>>>> + if (power_on) {
>>>>> + u32 val;
>>>>>
>>>>> - /* A read clear register. clear the register to prevent
>>>>> - * Q6 from entering wrong code path.
>>>>> - */
>>>>> - val = ath11k_pcic_read32(ab, WLAON_SOC_RESET_CAUSE_REG);
>>>>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "soc reset cause %d\n", val);
>>>>> + val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY);
>>>>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val);
>>>>> +
>>>>> + /* A read clear register. clear the register to prevent
>>>>> + * Q6 from entering wrong code path.
>>>>> + */
>>>>> + val = ath11k_pcic_read32(ab, WLAON_SOC_RESET_CAUSE_REG);
>>>>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "soc reset cause %d\n", val);
>>>>> + }
>>>>> }
>>>>>
>>>>> static int ath11k_pci_set_link_reg(struct ath11k_base *ab,
>>>>> @@ -368,7 +374,7 @@ static void ath11k_pci_sw_reset(struct ath11k_base *ab, bool power_on)
>>>>> }
>>>>>
>>>>> ath11k_mhi_clear_vector(ab);
>>>>> - ath11k_pci_clear_dbg_registers(ab);
>>>>> + ath11k_pci_clear_dbg_registers(ab, power_on);
>>>>> ath11k_pci_soc_global_reset(ab);
>>>>> ath11k_mhi_set_mhictrl_reset(ab);
>>>>> }
>>>>
>>>
>>
More information about the ath11k
mailing list