MMC quirks relating to performance/lifetime.

Arnd Bergmann arnd at arndb.de
Sun Feb 20 10:23:03 EST 2011


On Sunday 20 February 2011 06:56:39 Andrei Warkentin wrote:
> On Sat, Feb 19, 2011 at 5:20 AM, Arnd Bergmann <arnd at arndb.de> wrote:

> > The numbers you see here are taken over multiple runs. Do you see a lot
> > of fluctuation when doing this with --count=1?
> >
> 
> Yep. Quite a bit.
> 
> # ./flashbench -c 1 -A -b 1024 /dev/block/mmcblk0p9
> write align 8388608	pre 4.52ms	on 7.58ms	post 3.93ms	diff 3.36ms
> write align 4194304	pre 5.97ms	on 8.69ms	post 4.36ms	diff 3.53ms
> write align 2097152	pre 3.57ms	on 7.96ms	post 4.6ms	diff 3.88ms
> write align 1048576	pre 5.33ms	on 27.4ms	post 4.88ms	diff 22.3ms
> write align 524288	pre 49.3ms	on 31.4ms	post 14.9ms	diff -679265
> write align 262144	pre 39.7ms	on 38.3ms	post 5.27ms	diff 15.8ms
> write align 131072	pre 33.8ms	on 45.4ms	post 5.26ms	diff 25.9ms
> write align 65536	pre 34.4ms	on 40.9ms	post 3.3ms	diff 22.1ms
> write align 32768	pre 30.2ms	on 44.8ms	post 5.13ms	diff 27.1ms
> write align 16384	pre 44.5ms	on 5.05ms	post 33.3ms	diff -338542
> write align 8192	pre 25.5ms	on 70.6ms	post 25.3ms	diff 45.2ms
> write align 4096	pre 4.89ms	on 4.47ms	post 5.29ms	diff -623390
> write align 2048	pre 4.88ms	on 4.89ms	post 5.2ms	diff -155781
> # ./flashbench -c 1 -A -b 1024 /dev/block/mmcblk0p9
> write align 8388608	pre 4.68ms	on 9.06ms	post 5.14ms	diff 4.15ms
> write align 4194304	pre 4.37ms	on 7.49ms	post 4.59ms	diff 3.01ms
> write align 2097152	pre 23.7ms	on 1.9ms	post 14.8ms	diff -173218
> write align 1048576	pre 14.8ms	on 19.9ms	post 4.75ms	diff 10.2ms
> write align 524288	pre 20.2ms	on 24.9ms	post 10.7ms	diff 9.46ms
> write align 262144	pre 20.2ms	on 3.01ms	post 20.1ms	diff -171062
> write align 131072	pre 25.9ms	on 24.9ms	post 9.85ms	diff 7.06ms
> write align 65536	pre 15.5ms	on 30.3ms	post 2.95ms	diff 21.1ms
> write align 32768	pre 27.3ms	on 19.1ms	post 5.86ms	diff 2.5ms
> write align 16384	pre 25.4ms	on 55.9ms	post 12.7ms	diff 36.9ms
> write align 8192	pre 4.8ms	on 102ms	post 9.47ms	diff 94.8ms
> write align 4096	pre 4.92ms	on 5.16ms	post 4.98ms	diff 207µs
> write align 2048	pre 4.64ms	on 4.92ms	post 5.45ms	diff -121860
> # ./flashbench -c 1 -A -b 1024 /dev/block/mmcblk0p9
> write align 8388608	pre 15.8ms	on 9.39ms	post 4.68ms	diff -854295
> write align 4194304	pre 4.76ms	on 7.54ms	post 3.82ms	diff 3.24ms
> write align 2097152	pre 19.9ms	on 9.73ms	post 4.44ms	diff -244517
> write align 1048576	pre 14.5ms	on 19.1ms	post 5.21ms	diff 9.23ms
> write align 524288	pre 24.9ms	on 29ms	post 5.89ms	diff 13.6ms
> write align 262144	pre 24.9ms	on 2.41ms	post 20.8ms	diff -204328
> write align 131072	pre 25.6ms	on 30ms	post 4.84ms	diff 14.8ms
> write align 65536	pre 26.4ms	on 24.4ms	post 6.16ms	diff 8.12ms
> write align 32768	pre 15ms	on 30.6ms	post 15.4ms	diff 15.4ms
> write align 16384	pre 16.1ms	on 45.4ms	post 16.5ms	diff 29.1ms
> write align 8192	pre 5.88ms	on 107ms	post 5.45ms	diff 101ms
> write align 4096	pre 5.17ms	on 5.78ms	post 4.83ms	diff 778µs
> write align 2048	pre 3.99ms	on 5.27ms	post 3.97ms	diff 1.29ms
> # ./flashbench -c 1 -A -b 1024 /dev/block/mmcblk0p9
> write align 8388608	pre 16.1ms	on 8.37ms	post 5.44ms	diff -241222
> write align 4194304	pre 4.07ms	on 7.27ms	post 3.89ms	diff 3.29ms
> write align 2097152	pre 24.2ms	on 18.5ms	post 5.63ms	diff 3.59ms
> write align 1048576	pre 4.08ms	on 18.9ms	post 5.46ms	diff 14.1ms
> write align 524288	pre 25.1ms	on 28ms	post 14.6ms	diff 8.13ms
> write align 262144	pre 15.8ms	on 30ms	post 5.4ms	diff 19.4ms
> write align 131072	pre 24.7ms	on 30.8ms	post 4.43ms	diff 16.2ms
> write align 65536	pre 5ms	on 40.5ms	post 5.95ms	diff 35.1ms
> write align 32768	pre 24.7ms	on 30.6ms	post 4.92ms	diff 15.8ms
> write align 16384	pre 25.2ms	on 132ms	post 10.2ms	diff 114ms
> write align 8192	pre 7.64ms	on 111ms	post 9.18ms	diff 102ms
> write align 4096	pre 5.11ms	on 3.92ms	post 5.4ms	diff -134159
> write align 2048	pre 3.92ms	on 4.41ms	post 4.51ms	diff 196µs

Every value is the average of eight measurements, so there are probably
some that include the 100ms garbage collection, and others that don't.
I'm more confused about this now than I was before.

> > Also, does the same happen with other blocksizes, e.g. 4096 or 8192, passed
> > to flashbench?
>
> # echo 0 > /sys/block/mmcblk0/device/page_size
> # ./flashbench -A -b 1024 /dev/block/mmcblk0p9
> write align 65536	pre 3.33ms	on 6.57ms	post 3.65ms	diff 3.08ms
> write align 32768	pre 3.68ms	on 6.6ms	post 3.7ms	diff 2.91ms
> write align 16384	pre 3.64ms	on 97.6ms	post 3.26ms	diff 94.2ms
> write align 8192	pre 3.49ms	on 115ms	post 3.62ms	diff 112ms
> write align 4096	pre 3.91ms	on 3.91ms	post 3.9ms	diff 360ns
> write align 2048	pre 3.92ms	on 3.92ms	post 3.92ms	diff -1374ns
> # ./flashbench -A -b 2048 /dev/block/mmcblk0p9
> write align 65536	pre 4.02ms	on 7.22ms	post 4.14ms	diff 3.14ms
> write align 32768	pre 4ms	on 7.07ms	post 3.95ms	diff 3.1ms
> write align 16384	pre 3.66ms	on 106ms	post 3.4ms	diff 102ms
> write align 8192	pre 3.56ms	on 106ms	post 3.36ms	diff 103ms
> write align 4096	pre 3.61ms	on 4.1ms	post 4.35ms	diff 117µs
> # ./flashbench -A -b 4096 /dev/block/mmcblk0p9
> write align 65536	pre 3.89ms	on 6.97ms	post 3.96ms	diff 3.04ms
> write align 32768	pre 3.89ms	on 6.97ms	post 3.96ms	diff 3.04ms
> write align 16384	pre 3.74ms	on 114ms	post 4.05ms	diff 110ms
> write align 8192	pre 4.25ms	on 115ms	post 4.8ms	diff 110ms
> # ./flashbench -A -b 8192 /dev/block/mmcblk0p9
> write align 65536	pre 4.11ms	on 7.46ms	post 4.24ms	diff 3.29ms
> write align 32768	pre 4.15ms	on 7.45ms	post 4.25ms	diff 3.25ms
> write align 16384	pre 4.24ms	on 96.1ms	post 3.83ms	diff 92.1ms

Ok, that is very consistent then at least.

> The following I thought this was interesting. I did it to see the big
> time go away, since it would end up being a 16K write straddling an 8K
> boundary, but the pre and post results I don't understand at all.
> 
> # ./flashbench -A -b 16384  /dev/block/mmcblk0p9
> write align 8388608	pre 121ms	on 7.76ms	post 116ms	diff -110845
> write align 4194304	pre 129ms	on 7.57ms	post 115ms	diff -114863
> write align 2097152	pre 121ms	on 7.78ms	post 123ms	diff -114318
> write align 1048576	pre 131ms	on 7.74ms	post 106ms	diff -110856
> write align 524288	pre 131ms	on 7.58ms	post 116ms	diff -115926
> write align 262144	pre 131ms	on 7.55ms	post 115ms	diff -115591
> write align 131072	pre 131ms	on 7.54ms	post 116ms	diff -115617
> write align 65536	pre 131ms	on 7.54ms	post 115ms	diff -115579
> write align 32768	pre 125ms	on 6.89ms	post 116ms	diff -113408

The description of the test case is probably suboptimal. What this does
is 32 KB accesses, with 32 KB alignment in the pre and post case, but 16 KB
alignment in the "on" case. The idea here is that it should never do
any access with less than "--blocksize" aligment.

This is what I think happens:
Since the partition is over 64 MB size and it can have 7 4 MB allocation units open,
writing to 8 locations on the drive separated 8 MB causes it to do garbage collection
all the time for 32KB accesses and larger. However, the "on" measurement is only
16 KB aligned, so it goes into T's buffer A for small writes, and does not hit
the garbage collection all the time, so it ends up being a lot faster.

	Arnd



More information about the linux-arm-kernel mailing list