[PATCH 2/2] ath10k: do not use coherent memory for tx buffers

Mon Nov 23 10:50:46 PST 2015

On 11/23/2015 10:18 AM, Felix Fietkau wrote:
> On 2015-11-23 18:25, Peter Oh wrote:
>> Hi,
>>
>> Have you measured the peak throughput?
>> The pre-allocated coherent memory concept was introduced as once of peak
>> throughput improvement.
> It's all still pre-allocated and pre-mapped.
Right. I mis-guessed with the title.
>
>> IIRC, dma_map_single takes about 4 us on Cortex A7 and dma_unmap_single
>> also takes time to invalid cache.
> That's why I didn't put a map/unmap in the hot path. There is only a
> cache sync there. With coherent memory, every single word access blocks
> until the transaction is complete. With cached/mapped memory, the CPU
> can fill the cachelines first, then flush it in one go. This usually
> ends up being faster than working with coherent memory directly.
>
>> Please share your tput number before and after, so I don't need to worry
>> about performance degrade.
> I don't have an ideal setup for tput tests at the moment, so I can't
> give you any numbers.
Could you share any rough number?
>   However, on the device that I'm testing on
> (IPQ806x based), this patch makes the difference between working and
> non-working wifi, fixing the regression introduced by your pre-allocated
> coherent memory patch.
Thank you for the catch up and fix.
Btw, the regression can be fixed by using GFP_KERNEL, instead of 
GFP_DMA, right?
>
> - Felix
Thanks,
Peter