MMC quirks relating to performance/lifetime.

Andrei Warkentin andreiw at motorola.com
Wed Feb 16 21:08:56 EST 2011


On Tue, Feb 15, 2011 at 11:16 AM, Arnd Bergmann <arnd at arndb.de> wrote:
> On Monday 14 February 2011, Andrei Warkentin wrote:
>> > There are multiple ways how this could be implemented:
>> >
>> > 1. Have one exception cache for all "special" blocks. This would normally
>> >   be for FAT32 subdirectory updates, which always write to the same
>> >   few blocks. This means you can do small writes efficiently anywhere
>> >   on the card, but only up to a (small) fixed number of block addresses.
>> >   If you overflow the table, the card still needs to go through an
>> >   extra PE for each new entry you write, in order to free up an entry.
>> >
>> > 2. Have a small number of AUs that can be in a special mode with efficient
>> >   small writes but inefficient large writes. This means that when you
>> >   alternate between small and large writes in the same AU, it has to go
>> >   through a PE on every switch. Similarly, if you do small writes to
>> >   more than the maximum number of AUs that can be held in this mode, you
>> >   get the same effect. This number can be as small as one, because that
>> >   is what FAT32 requires.
>> >
>> > In both cases, you don't actually have a solution for the problem, you just
>> > make it less likely for specific workloads.
>>
>> Aha, ok. By the way, I did find out that either suggestion works. So
>> I'll pull out the reversing portion of the patch. No need to
>> overcomplicate :).
>
> BTW, what file system are you using? I could imagine that each of ext4, btrfs
> and nilfs2 give you very different results here. It could be that if your
> patch is optimizing for one file system, it is actually pessimising for
> another one.
>

Ext4. I've actually been rewriting the patch a lot and it's taking
time because there are a lot of things that are wrong in it (so I feel
kinda bad for forwarding it to this list in the first place...). I've
already mentioned that there is no need to reorder, so that's going
away and it simplifies everything greatly.

I agree, which is why all of this is controlled now through sysfs, and
there are no more hard-coded checks for manfid, mmc versus sd or any
other magic. There is a page_size_secs attribute, through which you
can notify of the page size for the device. The workaround for small
writes crossing the page boundary (and winding up in Buffer B, instead
of A) is turned on by setting split_tlow and split_thigh, which
provided a threshold range in sectors over which the the writes will
be split/aligned. The second workaround for splitting larger requests
and writing them with reliable write (to avoid getting coalesced and
winding up in Buffer B again) is controlled through split_relw_tlow
and split_relw_thigh. Do you think there is a better way? Or is this
good enough?

So, as I mentioned before, T had done some tests given data provided
by M, and then T verified that this fix was good. I need to do my own
tests on the patch after I rewrite it. Is iozone the best tool I can
use? So far I have a MMC logging facility through connector that I use
to collect stats (useful for seeing how fs traffic translates to
actual mmc commands...once I clean it up I'll push here for RFC). What
about the tool you're writing? Any way I can use it?



More information about the linux-arm-kernel mailing list