[PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency
Vijaya Kumar K-1
vijay.kilari at stericsson.com
Mon Jun 27 12:34:52 EDT 2011
The below are the timings on clean & flush.
Size Clean Dirty_clean Flush Dirty_Flush
T1(ns) T2(ns) T3(ns) T2(ns)
4096 30517 30517 30517 30517
8192 30517 30517 30517 30517
16384 30518 30518 30518 30518
32768 30518 30518 30518 61035<--
36864 61036 61036 61035 61035
65536 91553 91553 91553 91553
131072 183106 183106 183106 183106
Full 30518 30518 30518 30518<--
/* Based on Above values, 32768 size is breakeven for flushing/cleaning
* full D cache
I have noticed with 32KB DLIMIT, there is small reduction about 1fps in
skiamark profile after this change. It could be because of full flush or
clean is causing more cache misses later on in the execution.
However with 64KB DLIMIT, there is further degrade in skiamark performance.
So I think 32KB is good value.
However the problems are seen in the Android UI. Small artifacts are
seen during Video playback on UI widgets.
This artifacts are not seen if clean is called for each cpu.
Also I find it takes some effort to implement clean_all / flush_all
API's in cache-V7.S (asm) file to execute on each cpu.
And hence it was parked aside.
And I have not investigated, why flush on both cases in case of flush all on
Both cpu's always works?
Thanks & Regards
From: Linus Walleij [mailto:linus.walleij at linaro.org]
Sent: Monday, June 27, 2011 5:30 PM
To: Russell King - ARM Linux; Srinidhi KASAGAR; Vijaya Kumar K-1
Cc: Per Forlin; Nicolas Pitre; Chris Ball; linaro-dev at lists.linaro.org; linux-mmc at vger.kernel.org; linux-arm-kernel at lists.infradead.org; Robert Fekete
Subject: Re: [PATCH v6 00/11] mmc: use nonblock mmc requests to minimize latency
On Mon, Jun 27, 2011 at 12:02 PM, Russell King - ARM Linux
<linux at arm.linux.org.uk> wrote:
> The next thing to think about in DMA-land is whether we should total up
> the size of the SG list and choose whether to flush the individual SG
> elements or do a full cache flush. There becomes a point where the full
> cache flush becomes cheaper than flushing each SG element individually.
We noticed that even for a single (large) buffer, any cache flush operation
above a certain threshold flushing indiviudal lines become more expensive
than flushing the entire cache.
I requested colleagues to look into implenting this threshold in the
arch/arm/mm/cache-v7.S file. but I think they ran into trouble and
eventually had to give up on it.
Vijay or Srinidhi, can you share your findings?
More information about the linux-arm-kernel