ath10k driver crashes whenever firmware crashes on ARM SoC

Kalle Valo kvalo at qca.qualcomm.com
Wed Jan 29 11:41:48 EST 2014


Hi,

Avery Pennarun <apenwarr at gmail.com> writes:

> When the ath10k firmware crashes on my device (let's not worry about
> why the firmware crashes right now; one problem at a time), my host
> CPU (ARMv7 based) can't recover.  I get some variant of this error:
>
> [  780.116977] Unhandled fault: imprecise external abort (0x1406) at 0x2ac3706c
> [  780.124336] Internal error: : 1406 [#1] SMP
>
> I've narrowed this down to this code in ath10k/pci.c, ath10k_pci_device_reset:
>
>         /* Put Target, including PCIe, into RESET. */
>         val = ath10k_pci_reg_read32(ar, SOC_GLOBAL_RESET_ADDRESS);
>         val |= 1;
>         ath10k_pci_reg_write32(ar, SOC_GLOBAL_RESET_ADDRESS, val);
>         for (i = 0; i < ATH_PCI_RESET_WAIT_MAX; i++) {
>                 if (ath10k_pci_reg_read32(ar, RTC_STATE_ADDRESS) &
>                                           RTC_STATE_COLD_RESET_MASK)
>                         break;
>                 msleep(1);
>        }

Are you using CUS223 board? I was told that it has a problem with the
cold reset. When you issue the cold reset, some voltage in the board
goes too low and there's a chance that it breaks PCI communication.

> Specifically, the pci_reg_read32().  I can insert as much time as I
> want between the write32 and the read32; it always performs the read,
> then crashes with the PC pointing a few instructions later, inside the
> msleep(), with the imprecise external abort.  I think this means the
> PCI read operation has encountered a PCI target abort, which suggests
> that the SOC_GLOBAL_RESET_ADDRESS line has not successfully reset the
> device.  From what I understand, on x86 processors PCI target aborts
> are not fatal, so you might not notice this problem on those
> platforms, but it's bad on ARM.

FWIW the same problem also happens on MIPS.

> I'm using the ath10k driver from linux-next 20140117, but I had the
> same problem with 3.13-rc2 so I don't think this has changed.
>
> Are other people seeing this?  Is there something I can try to resolve it?

Yes, we see it as well. And we see it also on when doing interface down,
for example when shutting down hostapd. Soon we will post patches to
workaround the interface down issue, but for firmware crashes we haven't
yet found a reliable solution. I hope there's a way to fix warm reset to
properly recover from a firmware crash.

-- 
Kalle Valo



More information about the ath10k mailing list