MMC quirks relating to performance/lifetime.
Andrei Warkentin
andreiw at motorola.com
Sat Feb 19 23:39:06 EST 2011
On Sat, Feb 19, 2011 at 3:54 AM, Arnd Bergmann <arnd at arndb.de> wrote:
> On Friday 18 February 2011 23:40:16 Andrei Warkentin wrote:
>> On Fri, Feb 18, 2011 at 1:47 PM, Andrei Warkentin <andreiw at motorola.com> wrote:
>>
>> Flashbench timings for both Sandisk and Toshiba cards. Attaching due to size.
>
> Very nice, thanks for the measurement!
>
> I don't think having the results inline in the mail is a problem,
> it would even make it easier to quote.
>
>> Some interesting things that I don't understand. For the align test, I
>> extended it to do a write align test (-A). I tried two partitions that
>> I could write over, and both read and writes behaved differently for
>> the two partitions on same device. Odd. They are both 4MB aligned.
>
> I never did a write align test because the results will be highly
> unreliable as soon as you get into thrashing. Your results seem
> to be meaningful still, so maybe we should have it after all, but
> I'll put a big warning on it.
>
Actually it would be a good idea to also bail/warn if you do the au
test with more open au's than the size of the passed device allows,
since it'll just wrap around and skew the results.
>> On the sandisk it was the write align that made the page size stand
>> out. The read align had pretty constant results.
>
> I've noticed on other Sandisk media that the read align test is
> sometimes useless. It may help to do a full erase of the partition,
> or to fill it with data before running the test.
>
>> On the toshiba the results varied wildly for the two partitions. For
>> partition 6, there was a clear pattern in the diff values for read
>> align. For 9, it was all over the place. For 9 with the write align,
>> 8K and 16K the crossing writes took ~115ms!! Look in attached files
>> for all the data.
>
> Partition 6 is a lot smaller, so you have the accesses less than a
> segment apart, so it shows other effects.
>
>> The AU tests were interesting too, especially how with several open
>> AUs the throughput is higher for certain smaller sizes on sandisk, but
>> if I interpret it correctly both cards have at least 4 AUs, as I
>> didn't see yet a significant drop for small sizes. The larger ones I
>> am running now on mmcblk0p9 which is sufficiently larger for these
>> tests... (mmcblk0p6 is only 40mb, p9 is 314 mb)
>
> Right, you should try larger values for --open-au-nr here. It's at
> least a good sign that the drive can do random access inside a segment
> and that it can have at least 4 segments open. This is much better
> than I expected from your descriptions at first.
Actually the Toshiba one seems to have 7 AUs if I interpret this correctly.
^C
# ./flashbench -O -0 6 -b 512 /dev/block/mmcblk0p9
4MiB 5.91M/s
2MiB 8.84M/s
1MiB 10.8M/s
512KiB 13M/s
256KiB 13.6M/s
^C
# ./flashbench -O -0 7 -b 512 /dev/block/mmcblk0p9
4MiB 6.32M/s
2MiB 8.63M/s
1MiB 10.5M/s
512KiB 13.2M/s
256KiB 13M/s
^[[A^[[D^[[A128KiB 12.3M/s
^C
# ./flashbench -O -0 8 -b 512 /dev/block/mmcblk0p9
4MiB 6.65M/s
2MiB 7.02M/s
1MiB 6.36M/s
512KiB 3.17M/s
256KiB 1.53M/s
The Sandisk one has 20 AUs.
# ./flashbench -O -0 20 -b 512 /dev/block/mmcblk0p9
4MiB 11.3M/s
2MiB 12.8M/s
1MiB 9.87M/s
512KiB 9.97M/s
256KiB 9.13M/s
128KiB 8.05M/s
^C
# ./flashbench -O -0 50 -b 512 /dev/block/mmcblk0p9
4MiB 7.19M/s
^C
# ./flashbench -O -0 2 -b 512 /dev/block/mmcblk0p9
^C
# ./flashbench -O -0 22 -b 512 /dev/block/mmcblk0p9
4MiB 11.6M/s
2MiB 12.3M/s
1MiB 5.13M/s
512KiB 2.57M/s
256KiB 1.59M/s
128KiB 1.16M/s
64KiB 776K/s
^C
# ./flashbench -O -0 21 -b 512 /dev/block/mmcblk0p9
4MiB 11.2M/s
2MiB 12.4M/s
1MiB 4.65M/s
512KiB 1.95M/s
256KiB 955K/s
>
> However, the drop from 32 KB to 16 KB in performance is horrifying
> for the Toshiba drive, it's clear that this one does not like
> to be accessed smaller than 32 KB at a time, an obvious optimization
> for FAT32 with 32 KB clusters. How does this change with your
> kernel patches?
Since the only performance-increasing patch here would be just the one
that splits unaligned accesses, I wouldn't expect any improvements for
page-aligned accesses < 32KB. As you can see here...
# cat /sys/block/mmcblk0/device/page_size
8192
# ./flashbench -O -0 1 -b 512 /dev/block/mmcblk0p9
4MiB 6.81M/s
2MiB 7.73M/s
1MiB 9.21M/s
512KiB 9.98M/s
256KiB 10.3M/s
128KiB 10.2M/s
64KiB 9.76M/s
32KiB 8.52M/s
16KiB 3.68M/s
8KiB 1.72M/s
4KiB 837K/s
^C
# echo 0 > /sys/block/mmcblk0/device/page_size
# ./flashbench -O -0 1 -b 512 /dev/block/mmcblk0p9
4MiB 6.42M/s
2MiB 7.79M/s
1MiB 9.22M/s
512KiB 10M/s
256KiB 9.94M/s
128KiB 10.1M/s
64KiB 9.68M/s
32KiB 8.5M/s
16KiB 3.65M/s
8KiB 1.73M/s
4KiB 838K/s
2KiB 417K/s
^C
#
>
> For the sandisk drive, it's funny how it is consistently faster
> doing random access than linear access. I don't think I've seem that
> before. It does seem to have some cache for linear access using
> smaller than 16 KB, and can probably combine them when it's only
> writing to a single segment.
Yes, that is pretty interesting. Smaller than 16K? Not smaller than
32K? I wonder what it is doing...
More information about the linux-arm-kernel
mailing list