Deadlock on (faked) firmware crash, CUS239, modified 10.4.3 firmware.

Michal Kazior michal.kazior at tieto.com
Tue Mar 29 01:14:22 PDT 2016


On 26 March 2016 at 03:27, Ben Greear <greearb at candelatech.com> wrote:
> I've been seeing this for a while now.  When firmware crashes, often the OS
> at least
> partially locks up.
>
> This is modified 4.4.6 driver/kernel, modified 10.4.3 firmware.  I had 35
> stations associated,
> and reset one.  Flush fails (maybe because nothing stops tx on other vdevs
> while flushing one?)
> and I added a fake firmware crash even in case flush fails.
>
> Then, I get deadlock.  I've seen other similar deadlocks when the firmware
> crashed due
> to 'natural' causes when adding vdevs....
>
> Looks like the same process is not actually stuck in one place...each time
> the kernel splats,
> it is in a different place..spinning and spinning.  Maybe it needs a
> bail-out on firmware
> crash?
[...]
> [  316.477677] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
> [kworker/u8:3:257]
> [  316.477720] Modules linked in: nf_conntrack_netlink nf_conntrack
> nfnetlink nf_defrag_ipv4 8021q garp mrp stp llc bnep bluetooth fuse macvlan
> wanlink(O) pktgen rpcsec_gss_krb5 nfsv4 nfs fscache iTCO_wdt
> iTCO_vendor_support coretemp ath9k ath10k_pci hwmon ath9k_common ath10k_core
> ath9k_hw intel_rapl iosf_mbi ath x86_pkg_temp_thermal intel_powerclamp
> mac80211 kvm_intel kvm joydev irqbypass pcspkr serio_raw cfg80211
> snd_hda_codec_hdmi lpc_ich i2c_i801 snd_hda_codec_realtek
> snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep
> snd_seq snd_seq_device snd_pcm 8250_fintek snd_timer snd shpchp soundcore
> tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ata_generic
> pata_acpi i915 e1000e ptp pps_core i2c_algo_bit drm_kms_helper drm i2c_core
> fjes video ipv6 [last unloaded: nf_conntrack]
>
> [  316.477721] irq event stamp: 2111179
> [  316.477727] hardirqs last  enabled at (2111179): [<ffffffff8113c347>]
> vprintk_emit+0x3ab/0x46a
> [  316.477730] hardirqs last disabled at (2111178): [<ffffffff8113bff8>]
> vprintk_emit+0x5c/0x46a
> [  316.477742] softirqs last  enabled at (2111014): [<ffffffffa0e30965>]
> ath10k_set_key+0x136/0x602 [ath10k_core]
> [  316.477749] softirqs last disabled at (2111012): [<ffffffffa0e30946>]
> ath10k_set_key+0x117/0x602 [ath10k_core]
> [  316.477751] CPU: 1 PID: 257 Comm: kworker/u8:3 Tainted: G        W  O
> 4.4.6+ #21
> [  316.477752] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./HURONRIVER, BIOS 4.6.5 05/02/2012
> [  316.477780] Workqueue: wiphy3 ieee80211_iface_work [mac80211]
> [  316.477781] task: ffff880212d225c0 ti: ffff880212d50000 task.ti:
> ffff880212d50000
> [  316.477790] RIP: 0010:[<ffffffffa0e38c1b>]  [<ffffffffa0e38c1b>]
> ath10k_mac_tx_push_pending+0xc1/0x12d [ath10k_core]

Just in case, do you have these applied?

 750eeed89cf3 ath10k: fix pull-push tx threshold handling
 9d71d47eed20 ath10k: fix tx hang

Hmm.. If it still reproduces can you try the following diff?

--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -3780,6 +3780,8 @@ void ath10k_mac_tx_push_pending(struct ath10k *ar)
                list_del_init(&artxq->list);
                if (ret != -ENOENT)
                        list_add_tail(&artxq->list, &ar->txqs);
+               else if (artxq == last)
+                       last = list_last_entry(&ar->txqs, struct
ath10k_txq, list);

                ath10k_htt_tx_txq_update(hw, txq);


Michał



More information about the ath10k mailing list