hacked 4.4.6+, 10.4.3 firmware, Running out of ring-index for pipe-id 3 (WMI).

Michal Kazior michal.kazior at tieto.com
Thu Mar 31 22:18:48 PDT 2016


On 31 March 2016 at 18:44, Ben Greear <greearb at candelatech.com> wrote:
> On 03/30/2016 11:51 PM, Michal Kazior wrote:
>>
>> On 29 March 2016 at 17:48, Ben Greear <greearb at candelatech.com> wrote:
>>>
>>> On 03/29/2016 01:05 AM, Michal Kazior wrote:
>>>>
>>>>
>>>> On 28 March 2016 at 21:01, Ben Greear <greearb at candelatech.com> wrote:
>>>>>
>>>>>
>>>>> I'm seeing the ring-full messages below when running 35 stations on
>>>>> modified 10.4.3 firmware.  I also have serial console logging enabled,
>>>>> so
>>>>> things are running a bit slow...this seems to exacerbate the issue.
>>>>>
>>>>> [   91.108923] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid:
>>>>> 2
>>>>> credits: 1 ep->tx_credits: 1  credit-flow-enabled: 1
>>>>> [   91.108932] ath10k_pci 0000:05:00.0: could not request stats (type
>>>>> 128
>>>>> ret -105)
>>>>> [   91.108942] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask:
>>>>> 0x1f
>>>>> write_idx: 2 sw-idx: 3  n_items: 1 pipe-id: 3
>>>>> [   91.108944] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid:
>>>>> 2
>>>>> credits: 1 ep->tx_credits: 1  credit-flow-enabled: 1
>>>>> [   91.108952] ath10k_pci 0000:05:00.0: could not request stats (type 1
>>>>> ret
>>>>> -105)
>>>>> [   91.108953] ath10k_pci 0000:05:00.0: failed to get fw stats for
>>>>> ethtool:
>>>>> -105
>>>>> [   91.109039] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask:
>>>>> 0x1f
>>>>> write_idx: 2 sw-idx: 3  n_items: 1 pipe-id: 3
>>>>> [   91.109041] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid:
>>>>> 2
>>>>> credits: 1 ep->tx_credits: 1  credit-flow-enabled: 1
>>>>> [   91.109050] ath10k_pci 0000:05:00.0: could not request stats (type
>>>>> 128
>>>>> ret -105)
>>>>> [   91.109060] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask:
>>>>> 0x1f
>>>>> write_idx: 2 sw-idx: 3  n_items: 1 pipe-id: 3
>>>>> [   91.109062] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid:
>>>>> 2
>>>>> credits: 1 ep->tx_credits: 1  credit-flow-enabled: 1
>>>>> [   91.109070] ath10k_pci 0000:05:00.0: could not request stats (type 1
>>>>> ret
>>>>> -105)
>>>>> [   91.109072] ath10k_pci 0000:05:00.0: failed to get fw stats for
>>>>> ethtool:
>>>>> -105
>>>>> [   91.109157] ath10k_pci 0000:05:00.0: hif-tx-sg, full, nentries_mask:
>>>>> 0x1f
>>>>> write_idx: 2 sw-idx: 3  n_items: 1 pipe-id: 3
>>>>> [   91.109160] ath10k_pci 0000:05:00.0: htc failed hif-tx-sq: -105 eid:
>>>>> 2
>>>>> credits: 1 ep->tx_credits: 1  credit-flow-enabled: 1
>>>>>
>>>>>
>>>>> I am struggling to understand how the pipe can be full since we have
>>>>> tx-credits logic
>>>>> enabled for the WMI pipe.
>>>>>
>>>>> Any suggestions on what sort of bugs could cause this?
>>>>>
>>>>> And, should the ath10k_wmi_cmd_send retry when we get a -105 return
>>>>> code in hopes it will free up shortly instead of just failing and
>>>>> leaving
>>>>> the system in invalid state?
>>>>
>>>>
>>>>
>>>> It probably shouldn't. As you've pointed out HTC tx credits should
>>>> prevent this in the first place. If you see -105 it means something is
>>>> really broken and needs to be fixed properly.
>>>>
>>>> A thing that comes to mind is that CE -for whatever reason- would need
>>>> to stop completing CE ring items. Are you running with MSI? 1 or
>>>> multiple interrupts? Did you try forcing legacy interrupt mode to rule
>>>> out MSI problems?
>>>>
>>>> You could add a debug messages to see if the HTC-WMI CE ring gets tx
>>>> completions properly.
>>>
>>>
>>>
>>> I don't think I'm using MSI.  Could it be that whatever logic that should
>>> be processing the tx-completions is just running slower than whatever is
>>> handling the WMI messages (and credits)?
>>
>>
>> Your WMI command queue is limited to HTC Tx credits (2, right?). This
>> means you can enqueue, in practice, 2 CE items to WMI's CE Tx pipe.
>> Once you've done that you have to wait until next interrupt carrying
>> HTC Rx message with Tx Credit Update. If you get this it implies FW
>> received your WMI commands which implies WMI's CE Tx pipe was updated
>> (and at least the 2 CE's associated with your WMI commands have been
>> consumed/completed). Even if you assume CE processing ordering is
>> reversed (i.e. HTC Rx gets processed before HTC Tx completions are)
>> you still should be able to have enqueued no more than 4 CE items at a
>> time as far as WMI is concerned.
>>
>> Now, if you assume MSI-range (multiple MSI interrupts; a vector) is
>> enabled, you can service each CE pipe in a separate interrupt and
>> tasklet. This could, in theory, result in some weird race as HTC Tx
>> credits and CE Tx pipe completions are not guaranteed to be
>> serialized.
>>
>> Or maybe you're using some forced WMI commands in your fork and
>> disregard Tx credits in some cases? This could explain the problem
>> even when running with a single interrupt.
>
>
> So, I am using MSI-X, I guess?
>
> # dmesg|grep -i msi
> [65284.853372] ath10k_pci 0000:05:00.0: pci irq msi-x interrupts 13 irq_mode
> 0 reset_mode 0

Yep. This at least makes it possible for this weird problem to come
into existance. However I still find it a little hard to believe for
tasklets to be scheduled this badly. Maybe the device doesn't assert
interrupts properly as Adrian suggested? Or maybe they are not mapped
properly? I think you're actually the first one to exercise MSI-range
support on qca99x0.


Michał



More information about the ath10k mailing list