[ath9k-devel] [PATCH] ath10k: Fix crash when using v1 hardware.

Ben Greear greearb at candelatech.com
Thu Jul 11 11:05:11 EDT 2013


On 07/11/2013 02:36 AM, Kalle Valo wrote:
> greearb at candelatech.com writes:
>
>> From: Ben Greear <greearb at candelatech.com>
>>
>> I put a v1 NIC from an TP-LINK AC 1750 AP in
>> a 64-bit PC, and the OS crashes on bootup.  I'm not
>> sure how broken my hardware is (possibly completely non
>> functional), but at least with this patch it will no longer
>> crash the OS.  Not sure it ever got far enough to try,
>> but I also do not have firmware for the NIC.
>>
>> With this patch I get this info on module load:
>>
>> ath10k_pci 0000:05:00.0: BAR 0: assigned [mem 0xf4400000-0xf45fffff 64bit]
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (0xf4400004 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: BAR 0: error updating (high 0x000000 != 0xffffffff)
>> ath10k_pci 0000:05:00.0: Refused to change power state, currently in D3
>> ath10k: MSI-X interrupt handling (8 intrs)
>> ath10k: Unable to wakeup target
>> ath10k: target takes too long to wake up (awake count 1)
>> ath10k: src_ring ffff88020c0d0a00:  write_index is out of bounds: 4294967295  nentries_mask: 15.
>> ath10k: dest_ring ffff88020db2c000:  write_index is out of bounds: 4294967295  nentries_mask: 511.
>> ath10k: dest_ring ffff880210d56400:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff880210d57600:  write_index is out of bounds: 4294967295  nentries_mask: 31.
>> ath10k: src_ring ffff88020fe70000:  write_index is out of bounds: 4294967295  nentries_mask: 2047.
>> ath10k: src_ring ffff880212989b40:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: dest_ring ffff880212989960:  write_index is out of bounds: 4294967295  nentries_mask: 1.
>> ath10k: Failed to get pcie state addr: -5
>> ath10k: early firmware event indicated
>> ------------[ cut here ]------------
>> WARNING: at /home/greearb/git/linux.wireless-testing/drivers/net/wireless/ath/ath10k/ce.c:771 ath10k_ce_per_engine_service+0x53/0x1b4 [ath10k_pci]()
>> ....
>> (it hits the warning case about 5-6 times and then seems to quiesce OK).
>
> I haven't seen this myself so it might be a hw problem, but difficult to
> say.
>
>> +	/* On v1 hardware at least, setup can fail, causing ce_id_state to
>> +	 * be cleaned up, but this method is still called a few times.  Check
>> +	 * for NULL here so we don't crash.  Probably a better fix is to stop
>> +	 * the ath10k_pci_ce_tasklet sooner.
>> +	 */
>> +	if (WARN_ONCE(!ce_state, "ce_id_to_state[%i] is NULL\n", ce_id))
>> +		return;
>> +
>> +	ctrl_addr = ce_state->ctrl_addr;
>> +
>
> The tests you add look like workarounds. I would prefer to try fix these
> by going to the source of the problem. Maybe we should add
> ath10k_pci_wake() and ath10k_do_pci_wake()?

These are work-arounds, but you should not let a bad piece of hardware/firmware crash
the entire OS just because you don't want to do sanity checking on the
values you get from the firmware.  Perhaps there is a better fix for the
code above, but the warning splat should still provide incentive to get
it right, while not crashing the OS in the meantime.


> Can you enable few debug logs, like ATH10K_DBG_PCI, and post them? That
> would give more hint there things are going wrong.


Yes, I can do that.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com




More information about the ath10k mailing list