[PATCH] ath10k: fix system hang at qca99x0 probe on x86 platform

Michal Kazior michal.kazior at tieto.com
Tue Jul 19 22:36:02 PDT 2016

On 19 July 2016 at 17:25, Manoharan, Rajkumar <rmanohar at qti.qualcomm.com> wrote:
> On June 30, 2016 12:39 PM, Michal Kazior <michal.kazior at tieto.com> wrote:
>> On 29 June 2016 at 18:35, Manoharan, Rajkumar <rmanohar at qti.qualcomm.com> wrote:
>>>>> Am 29.06.2016 um 16:04 schrieb Sebastian Gottschall:
>>>>> this fix will crash QCA9980 on QCA IPQ8064 cpu based systems.
>>>>> so please rework it, or leave it out.
>>>>> note:
>>>>> maybe the limit of 256kb is too low for that card
>>>> by the way. 512 works
>> I think this suggests the problem isn't about memory chunk size limit
>> per se but some kind of bug in address/offset logic in fw or hw.
>> DMA coherent and single-map addresses use completely different ranges
>> in many cases. Perhaps some MSBs are not properly handled in fw or hw.
>> I recall there is a magic macro through which target device accesses
>> host memory so maybe that's a good place to look to better understand
>> the problem?
> Michał,
> Could you please shed some light on this issue? It seems this issue is popping up
> more frequently and there are multiple threads for this issue.
> "Anyone brought up 9984 NIC on x86-64?"
> "AR9882 IOMMU faults"

I think IOMMU faults were solved by using DMA_BIDIRECTIONAL, no?

Anyway, FWIW there's this concept in firmware called dma_local_bits
and A_DMA_ADDR()/A_CPU_ADDR(). Not sure if it's relevant but may be
worth checking out in detail.

>> I recall Ben mentioned he worked around the problem by enabling
>> IOMMU/VT-d on his system. This could either prevent the device from
>> doing bad things or maybe changed DMA address ranges that are handed
>> out to the driver effectively or changed PCI controller behavior in
>> some way.
>>> Thanks a lot Sebastian. Let me confirm the same on x86 and will update the change.
>> Just changing the memory chunk size limit blindly is bad and
>> Sebastian's crash has proven it. 512 may seem to work now but it may
>> fail with a other 10.4 firmware revisions or make x86 hang in other
>> cases.
> Even with current logic, If the memory chunk allocation fails for bigger size, then it tries
> to allocate smaller chunks. So If smaller chunks causes unexpected behaviour, it is even
> applicable to existing logic. no?

We still don't know *why* using non-coherent memory causes problems.
Changing chunk size limit seems to alter the behavior in some
unpredictable ways, yes, but it's really hard to tell if the "try
smaller chunk sizes" *itself* introduces any problems.


