[PATCH v2] wifi: ath11k: Add firmware coredump collection support

Miaoqing Pan quic_miaoqing at quicinc.com
Fri Jul 12 02:38:11 PDT 2024



On 7/11/2024 12:20 AM, Kalle Valo wrote:
> Miaoqing Pan <quic_miaoqing at quicinc.com> writes:
> 
>> In case of firmware assert snapshot of firmware memory is essential for
>> debugging. Add firmware coredump collection support for PCI bus.
>> Collect RDDM and firmware paging dumps from MHI and pack them in TLV
>> format and also pack various memory shared during QMI phase in separate
>> TLVs.  Add necessary header and share the dumps to user space using dev
>> coredump framework. Coredump collection is disabled by default and can
>> be enabled using menuconfig. Dump collected for a radio is 55 MB
>> approximately.
>>
>> The changeset is mostly copied from:
>> https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/.
>>
>> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
>>
>> Signed-off-by: Miaoqing Pan <quic_miaoqing at quicinc.com>
> 
> This has a similar KASAN warning as did the ath12k patch:
> 
> [   48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left
> [   48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200
> [   48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
> [   63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash
> [   64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM
> [   64.132388] ==================================================================
> [   64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
> [   64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88
> [   64.132578]
> [   64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678
> [   64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
> [   64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k]
> [   64.132791] Call Trace:
> [   64.132827]  <TASK>
> [   64.132867]  dump_stack_lvl+0x7d/0xe0
> [   64.132910]  print_address_description.constprop.0+0x33/0x3a0
> [   64.132958]  print_report+0xb5/0x260
> [   64.133038]  ? kasan_addr_to_slab+0xd/0x80
> [   64.133096]  kasan_report+0xd8/0x110
> [   64.133132]  ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
> [   64.133179]  ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
> [   64.133225]  __asan_report_store_n_noabort+0x12/0x20
> [   64.133266]  ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
> [   64.133311]  ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci]
> [   64.133358]  ? lock_sync+0x1a0/0x1a0
> [   64.133398]  ath11k_coredump_collect+0x60/0x73 [ath11k]
> [   64.133466]  ath11k_core_reset+0x225/0x640 [ath11k]
> [   64.133524]  ? debug_smp_processor_id+0x17/0x20
> [   64.133564]  process_one_work+0x8cc/0x19c0
> [   64.133893]  ? pwq_dec_nr_in_flight+0x580/0x580
> [   64.133934]  ? move_linked_works+0x128/0x2c0
> [   64.134007]  ? assign_work+0x15e/0x270
> [   64.134074]  worker_thread+0x715/0x1230
> [   64.134114]  ? __this_cpu_preempt_check+0x13/0x20
> [   64.134153]  ? lockdep_hardirqs_on+0x7d/0x100
> [   64.134192]  ? rescuer_thread+0xdb0/0xdb0
> [   64.134229]  kthread+0x2fa/0x3f0
> [   64.134266]  ? kthread_insert_work_sanity_check+0xd0/0xd0
> [   64.134308]  ret_from_fork+0x31/0x70
> [   64.134345]  ? kthread_insert_work_sanity_check+0xd0/0xd0
> [   64.134386]  ret_from_fork_asm+0x11/0x20
> [   64.134426]  </TASK>
> [   64.134459]
> [   64.134498] The buggy address belongs to the virtual mapping at#012[   64.134498]  [ffffc90003965000, ffffc90004977000) created by:#012[   64.134498]  ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci]
> [   64.134576]
> [   64.134606] The buggy address belongs to the physical page:
> [   64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee
> [   64.134699] flags: 0x200000000000000(node=0|zone=2)
> [   64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000
> [   64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> [   64.135523] page dumped because: kasan: bad access detected
> [   64.136273]
> [   64.136928] Memory state around the buggy address:
> [   64.137641]  ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [   64.138361]  ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [   64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [   64.139742]                       ^
> [   64.140495]  ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [   64.141222]  ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
> [   64.141892] ==================================================================
> [   64.142674] Disabling lock debugging due to kernel taint
> [   64.143630] ath11k_pci 0000:06:00.0: Uploading coredump

Thanks for catching this issue, it's triggered by the statement of 
'dump_tlv->type = cpu_to_le32(mem_idx);' The 'buf' which assigned to 
'dump_tlv' is out-of-bounds. Will be fixed in the next version.

> 
>> ---
>> v2: fix implicit declaration of function 'vzalloc'.
>>
>>   drivers/net/wireless/ath/ath11k/Kconfig    |  11 ++
>>   drivers/net/wireless/ath/ath11k/Makefile   |   1 +
>>   drivers/net/wireless/ath/ath11k/core.c     |   2 +
>>   drivers/net/wireless/ath/ath11k/core.h     |   5 +
>>   drivers/net/wireless/ath/ath11k/coredump.c |  52 ++++++
>>   drivers/net/wireless/ath/ath11k/coredump.h |  79 +++++++++
>>   drivers/net/wireless/ath/ath11k/hif.h      |   7 +
>>   drivers/net/wireless/ath/ath11k/mhi.c      |   5 +
>>   drivers/net/wireless/ath/ath11k/mhi.h      |   1 +
>>   drivers/net/wireless/ath/ath11k/pci.c      | 191 +++++++++++++++++++++
>>   drivers/net/wireless/ath/ath11k/qmi.c      |  45 ++---
>>   drivers/net/wireless/ath/ath11k/qmi.h      |   9 +-
>>   12 files changed, 384 insertions(+), 24 deletions(-)
>>   create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c
>>   create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h
> 
> I feel that the QMI changes should be in a separat patch and explaining
> in detail what they are about. Didn't review those now as there's no
> explanation.

Minor changes for updating 'iaddr' definition. IMO, don't need a 
separate patch.
struct target_mem_chunk {
         u32 prev_size;
         u32 prev_type;
         dma_addr_t paddr;
-       u32 *vaddr;
-       void __iomem *iaddr;
+       union {
+               u32 *vaddr;
+               void __iomem *iaddr;
+       } v;
  };


> 
>> diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig
>> index 27f0523bf967..bb91da0098b4 100644
>> --- a/drivers/net/wireless/ath/ath11k/Kconfig
>> +++ b/drivers/net/wireless/ath/ath11k/Kconfig
>> @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL
>>   	  Enable ath11k spectral scan support
>>   
>>   	  Say Y to enable access to the FFT/spectral data via debugfs.
>> +
>> +config ATH11K_COREDUMP
>> +	bool "ath11k coredump"
>> +	depends on ATH11K
>> +	select WANT_DEV_COREDUMP
>> +	help
>> +	  Enable ath11k coredump collection
>> +
>> +	  If unsure, say Y to make it easier to debug problems. But if
>> +	  dump collection not required choose N.
> 
> I'm not sure if a new Kconfig option is justified? Maybe instead just
> use CONFIG_DEV_COREDUMP directly.

ok, will be removed.




More information about the ath11k mailing list