Unable to read firmware registers on crash?

Tue Feb 3 05:57:45 PST 2015

On 02/02/2015 10:31 PM, Michal Kazior wrote:
> On 2 February 2015 at 18:03, Ben Greear <greearb at candelatech.com> wrote:
>> On 02/02/2015 04:11 AM, Michal Kazior wrote:
>>> On 1 February 2015 at 18:46, Ben Greear <greearb at candelatech.com> wrote:
>>>> I am trying to debug a case where firmware occasionally crashes and
>>>> the driver cannot read any crash dump to debug problem further.
>>>>
>>>> Any idea what might be the problem and how I might could read info
>>>> from the firmware (or hack firmware to deliver the crash info in some
>>>> other means...maybe though a previously reserved piece of memory on
>>>> the host??)
>>>>
>>>> [147334.397148] ath10k: firmware crashed! (uuid
>>>> ff405224-b2a4-493d-b619-19ad8152d190)
>>>> [147334.404808] ath10k: hardware name qca988x hw2.0 version 0x4100016c
>>>> [147334.411111] ath10k: firmware version: 10.1.467-ct-community-full-013
>>>> [147334.429603] ath10k: failed to read diag value at 0x1300804: -16
>>>> [147334.435647] ath10k: failed to read FW dump area address: -16
>>>
>>> I see this rarely. Mostly when the device goes bonkers at which point
>>> warm reset doesn't work anymore and I'm forced to either risk cold
>>> reset causing platform lock up or re-inject the card in the express
>>> card slot.
>>>
>>> I haven't played with this so I don't know if DMA is still possible
>>> (it might not). MMIO should be operational and there should be some
>>> scratch registers chilling around... ;-)
>>
>> To normally read the crash dump, we do this over a CE pipe?
>
> Correct. The CE7, so called diagnostic window, is used.
>
>
>> So, if IRQ handlers are thoroughly busted on the target, then
>> that could be reason why we cannot read the crash dump?
>
> Incorrect. From what I understand the diagnostic window CE has
> built-in logic for fetching target RAM memory chunks as per each
> request. This means even if target program has masked IRQs and is
> running in a while (1) {} host is still able to access its memory.
>
> If you're unable to read the dump it means that CE has crashed
> (whatever that _actually_ means) and thus the built-in logic for
> diagnostic windows goes down as well. I never got around to recover CE
> crash alone without resorting to the risky target cold reset.

Ok, that makes sense...I was curious how dump could be read with the while(1) assert loop going...

>> If I were to use MMIO, what sort of things could I get
>> access to?  Just registers you think?  I might could tweak
>> the firmware assert routine to scribble the crash register
>> contents into a specific place, perhaps a bit at a time
>> so that the host could read it if normal crash dump read
>> fails?
>
> I was thinking that instead of doing a while (1) {} loop towards the
> end of the assert routine you could instead put a busy loop and
> introduce a simple interaction within it via a bunch of MMIO registers
> that are software writable (e.g. scratch registers) and aren't used by
> the MAC related hw. With that you could have a fallback way to
> "stream" the crash dump data by putting each data word in a register
> at a time and host ACKing each one.

That sounds reasonable.  Do you know of any example code that accesses
a target register or two with the MMIO logic on the host?  I think firmware has some examples of how
to fiddle with registers, so I think I can make a stab at that part....

Thanks,
Ben

>
>
> Michał
>
> _______________________________________________
> ath10k mailing list
> ath10k at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com