[PATCH v2] wifi: ath11k: Add firmware coredump collection support
Kalle Valo
kvalo at kernel.org
Wed Jul 10 09:20:50 PDT 2024
Miaoqing Pan <quic_miaoqing at quicinc.com> writes:
> In case of firmware assert snapshot of firmware memory is essential for
> debugging. Add firmware coredump collection support for PCI bus.
> Collect RDDM and firmware paging dumps from MHI and pack them in TLV
> format and also pack various memory shared during QMI phase in separate
> TLVs. Add necessary header and share the dumps to user space using dev
> coredump framework. Coredump collection is disabled by default and can
> be enabled using menuconfig. Dump collected for a radio is 55 MB
> approximately.
>
> The changeset is mostly copied from:
> https://lore.kernel.org/all/20240325183414.4016663-1-quic_ssreeela@quicinc.com/.
>
> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-04358-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
>
> Signed-off-by: Miaoqing Pan <quic_miaoqing at quicinc.com>
This has a similar KASAN warning as did the ath12k patch:
[ 48.364483] modprobe (1116) used greatest stack depth: 21760 bytes left
[ 48.450859] ath11k_pci 0000:06:00.0: chip_id 0x2 chip_family 0xb board_id 0x106 soc_id 0x400c0200
[ 48.451252] ath11k_pci 0000:06:00.0: fw_version 0x11088c35 fw_build_timestamp 2024-04-17 08:34 fw_build_id WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
[ 63.694922] ath11k_pci 0000:06:00.0: simulating firmware assert crash
[ 64.118179] ath11k_pci 0000:06:00.0: firmware crashed: MHI_CB_EE_RDDM
[ 64.132388] ==================================================================
[ 64.132470] BUG: KASAN: vmalloc-out-of-bounds in ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[ 64.132530] Write of size 4 at addr ffffc9000497520c by task kworker/u32:2/88
[ 64.132578]
[ 64.132610] CPU: 5 PID: 88 Comm: kworker/u32:2 Not tainted 6.10.0-rc6-wt-ath+ #1678
[ 64.132659] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
[ 64.132719] Workqueue: ath11k_aux_wq ath11k_core_reset [ath11k]
[ 64.132791] Call Trace:
[ 64.132827] <TASK>
[ 64.132867] dump_stack_lvl+0x7d/0xe0
[ 64.132910] print_address_description.constprop.0+0x33/0x3a0
[ 64.132958] print_report+0xb5/0x260
[ 64.133038] ? kasan_addr_to_slab+0xd/0x80
[ 64.133096] kasan_report+0xd8/0x110
[ 64.133132] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[ 64.133179] ? ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[ 64.133225] __asan_report_store_n_noabort+0x12/0x20
[ 64.133266] ath11k_pci_coredump_download+0x10db/0x12c0 [ath11k_pci]
[ 64.133311] ? ath11k_pci_coredump_calculate_size+0x710/0x710 [ath11k_pci]
[ 64.133358] ? lock_sync+0x1a0/0x1a0
[ 64.133398] ath11k_coredump_collect+0x60/0x73 [ath11k]
[ 64.133466] ath11k_core_reset+0x225/0x640 [ath11k]
[ 64.133524] ? debug_smp_processor_id+0x17/0x20
[ 64.133564] process_one_work+0x8cc/0x19c0
[ 64.133893] ? pwq_dec_nr_in_flight+0x580/0x580
[ 64.133934] ? move_linked_works+0x128/0x2c0
[ 64.134007] ? assign_work+0x15e/0x270
[ 64.134074] worker_thread+0x715/0x1230
[ 64.134114] ? __this_cpu_preempt_check+0x13/0x20
[ 64.134153] ? lockdep_hardirqs_on+0x7d/0x100
[ 64.134192] ? rescuer_thread+0xdb0/0xdb0
[ 64.134229] kthread+0x2fa/0x3f0
[ 64.134266] ? kthread_insert_work_sanity_check+0xd0/0xd0
[ 64.134308] ret_from_fork+0x31/0x70
[ 64.134345] ? kthread_insert_work_sanity_check+0xd0/0xd0
[ 64.134386] ret_from_fork_asm+0x11/0x20
[ 64.134426] </TASK>
[ 64.134459]
[ 64.134498] The buggy address belongs to the virtual mapping at#012[ 64.134498] [ffffc90003965000, ffffc90004977000) created by:#012[ 64.134498] ath11k_pci_coredump_download+0x144/0x12c0 [ath11k_pci]
[ 64.134576]
[ 64.134606] The buggy address belongs to the physical page:
[ 64.134648] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x13faee
[ 64.134699] flags: 0x200000000000000(node=0|zone=2)
[ 64.134746] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000
[ 64.134796] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[ 64.135523] page dumped because: kasan: bad access detected
[ 64.136273]
[ 64.136928] Memory state around the buggy address:
[ 64.137641] ffffc90004975100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 64.138361] ffffc90004975180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 64.139074] >ffffc90004975200: 00 04 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 64.139742] ^
[ 64.140495] ffffc90004975280: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 64.141222] ffffc90004975300: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
[ 64.141892] ==================================================================
[ 64.142674] Disabling lock debugging due to kernel taint
[ 64.143630] ath11k_pci 0000:06:00.0: Uploading coredump
> ---
> v2: fix implicit declaration of function 'vzalloc'.
>
> drivers/net/wireless/ath/ath11k/Kconfig | 11 ++
> drivers/net/wireless/ath/ath11k/Makefile | 1 +
> drivers/net/wireless/ath/ath11k/core.c | 2 +
> drivers/net/wireless/ath/ath11k/core.h | 5 +
> drivers/net/wireless/ath/ath11k/coredump.c | 52 ++++++
> drivers/net/wireless/ath/ath11k/coredump.h | 79 +++++++++
> drivers/net/wireless/ath/ath11k/hif.h | 7 +
> drivers/net/wireless/ath/ath11k/mhi.c | 5 +
> drivers/net/wireless/ath/ath11k/mhi.h | 1 +
> drivers/net/wireless/ath/ath11k/pci.c | 191 +++++++++++++++++++++
> drivers/net/wireless/ath/ath11k/qmi.c | 45 ++---
> drivers/net/wireless/ath/ath11k/qmi.h | 9 +-
> 12 files changed, 384 insertions(+), 24 deletions(-)
> create mode 100644 drivers/net/wireless/ath/ath11k/coredump.c
> create mode 100644 drivers/net/wireless/ath/ath11k/coredump.h
I feel that the QMI changes should be in a separat patch and explaining
in detail what they are about. Didn't review those now as there's no
explanation.
> diff --git a/drivers/net/wireless/ath/ath11k/Kconfig b/drivers/net/wireless/ath/ath11k/Kconfig
> index 27f0523bf967..bb91da0098b4 100644
> --- a/drivers/net/wireless/ath/ath11k/Kconfig
> +++ b/drivers/net/wireless/ath/ath11k/Kconfig
> @@ -57,3 +57,14 @@ config ATH11K_SPECTRAL
> Enable ath11k spectral scan support
>
> Say Y to enable access to the FFT/spectral data via debugfs.
> +
> +config ATH11K_COREDUMP
> + bool "ath11k coredump"
> + depends on ATH11K
> + select WANT_DEV_COREDUMP
> + help
> + Enable ath11k coredump collection
> +
> + If unsure, say Y to make it easier to debug problems. But if
> + dump collection not required choose N.
I'm not sure if a new Kconfig option is justified? Maybe instead just
use CONFIG_DEV_COREDUMP directly.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
More information about the ath11k
mailing list