[PATCH v5 0/7] ath10k: firmware crash dump

Sat Aug 9 09:51:49 PDT 2014

On 08/08/2014 10:50 PM, Kalle Valo wrote:
> Ben Greear <greearb at candelatech.com> writes:
>
>> On 08/08/2014 01:28 PM, Kalle Valo wrote:
>>> Hi,
>>>
>>> here's my reworked Ben's patchset adding firmware crash dump support to ath10k.
>>> Unfortunately this crashes when reading the stack dump from the firmware but
>>> time run out for me to fix that and I wanted to send this for comments anyway.
>>>
>>> I did quite a lot of changes, basically to simplify the code, remove ifdefs and
>>> so on. Here's some sort of list what I did:
>>>
>>> * dump_data->tv_sec and tv_nsec to 64 bits (because long can be 32 bits
>>>    on some platforms)
>>
>> I did u32 on purpose because I know how to do ntohl, htonl if I need
>> to flip the machine order in user-space.
>
> On my 32-bit box I got a compiler warning for this and that's why I
> changed it to u64.

Casting could have fixed that I imagine.

>> Is there something similar for 64-bit numbers?
>
> At least I haven't heard anything. But it should be easy to implement on
> your own.

Sure...it's not a big deal, likely there will be at most two different apps
that ever decode this anyway (mine and qca's).

>>> * latest crash dump is always stored (instead of the oldest unread)
>>
>> At least with the kernel, the first crash is normally the most useful,
>> so I figured the same would be true of the firmware (a firmware crash
>> is excellent way to find hidden bugs in the driver, so upon crash/reload,
>> it is more likely that the driver will be screwed up and thus mis-configure
>> firmware...)
>
> Think of an AP with 3 month uptime where firmware crashes once in June
> and second time in August and when you retrieve the crash dump you get
> the one from June. I find that very unintuitive.

AP should have gathered the crash the first time, but again, not a big deal,
as I would gather the crash logs as soon as they happen anyway.

>>> * what should ath10k_fw_error_dump_open() do if firmware hasn't
>>>    crashed? check crashed_since_read and return zero len file? or an
>>>    error code? -ENOMSG?
>>
>> Maybe a mostly empty crash log with just the header, should be easy enough
>> to figure out it's not really a crash.  If we use udev to gather these,
>> it is likely to be stupid shell scripts doing a lot of the work, so
>> being clever on return codes might not be helpful.
>
> But with cp in that case you also get an empty file, right? That way
> it's easy to detect that there's no crash dump available.
>
> What is the benefit from providing an empty header to user space? I
> don't get that.

An empty file could be any sort of error or mis-configuration
...but an empty header means the infrastructure is working properly.

I think with all of this, it mostly a matter of opinion, and whatever
you choose should be fine with me.  If we find any serious issues, we
can of course fix it later.

>> I'll go dig through the patches in a bit..have to figure out why systemd
>> is fcked up on my lab machine first. :P
>
> But wasn't systemd supposed to fix all of our problems? ;)

On Fedora 19, the latest update forgot how to enable/disable programs.  It must
have a zero-regression-test policy as well as it's other issues :P

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com