mtdblock caching and syncing

Thu Apr 9 13:16:46 EDT 2009

On Thu, Apr 09, 2009 at 12:02:47PM -0400, Doug Graham wrote:
>On Thu, Apr 09, 2009 at 10:51:00AM -0400, Josh Boyer wrote:
>> On Thu, Apr 09, 2009 at 10:15:56AM -0400, Doug Graham wrote:
>> >
>> >The problem is that a sync() or fsync() on an mtdblock device does not
>> >actually get the data all the way to the flash device.  The mtdblock
>> >layer maintains its own cache of a single erase-unit (256KB in my case).
>> >If I open /dev/mtdblock0 for writing, write some stuff to it, then call
>> >fsync() but do not close the device, up to one erase-unit's worth of
>> >data may still be buffered in memory.  This data is only flushed when
>> >the device is actually closed (by mtdblock_release).  I think that
>> >this violates the intended semantics of sync and fsync.  I shouldn't be
>> >required to do a close() to force the data to the device.
>> 
>> The device in question isn't the flash.  It's the mtdblock device.  So
>> fsync semantics are preserved.  This is the same as writing to a file
>> on a hard drive, calling fsync, and having it sit in the hard drive's
>> cache.
>
>That's a good point, and one I've wondered about before.  I don't know
>much about how hard drives manage their cache, but I would assume that
>they don't leave dirty data in their cache for an unbounded period
>of time.  I'd guess that data is written to the actual disk within a
>few 10s of milliseconds after being sent to the device.

Right.

>In the case of mtdblock, dirty data can stay in the cache forever.

True.  And I agree that is entirely sub-optimal.

>> >I think this is fairly serious bug in a flash-based system, where there
>> >are frequently times that you want to make sure that data has actually
>> >made it all the way to the device.  I think that a sync() or fsync()
>> >really ought to somehow propagate all the way down to the mtdblock layer
>> >so that mtdblock can flush its buffer.
>> 
>> Why are you using mtdblock in a serious flash-based system?  The fact
>> that it buffers an entire eraseblock means you risk huge data loss in
>> the event of an unclean shutdown anyway (power loss).  No amount of
>> sync or fsync will fix that.
>
>We don't use mtdblock during normal operations; we use squashfs and jffs2
>(maybe ubifs sometime soon).  But one job that we do use mtdblock for is
>burning loads.  We could, and perhaps should, be using the char device
>instead to burn loads, except that those require specialized tools to do
>erases before writes.  To avoid the need for such specialized tools, we
>just use the equivalent of dd on the mtdblock device followed by a sync.
>But that doesn't work given the behaviour I'm complaining about.
>
>It's actually a little more complicated that that.  We have a system
>comprised of multiple cards.  When upgrading the system from the master
>card, we're using NBD to upgrade (some) loads on remote cards.  The NBD
>server running on the remote cards never closes the mtdblock device that
>it is managing, so the mtdblock_release() method never gets called.
>The NDB server cannot using the MTD character device because it knows
>nothing about the characteristics of flash, including the need to erase
>before writing.  Even if it did know about erasing, we'd want it to do
>exactly the same kind of caching the mtdblock already does, so mtdblock
>does seem like a good match in this case.  We can certainly modify the
>NBD server to close and reopen the device when it needs to be sure that
>data has actually been written to flash, but that seems a bit on the
>kludgy side, and doesn't help any other applications using mtdblock
>(like the dd scheme I mention above).
>
>> >Thoughts?  Suggestions?  Patches?
>> 
>> Word-weasling aside, if you have patches that fix the behavior you don't
>> like, they would certainly be looked at.  Setting pdflush to 5 seconds
>> instead of 30 would help a bit, or using the ioctl on the mtdblock device
>> that already exists to flush would help too.  However you might want to
>> really look at a system design that relies on mtdblock for data integrity.
>
>What's the point of mtdblock then?  All systems care about data integrity
>to some degree (some more than others, obviously), so if mtdblock makes
>no effort to preserve that integrity, where do you see it ever being
>used legitimately?

For cramfs, which is read-only.  Or in cases sort of like what you describe,
where the conditions of writes are tightly controlled and failure does not
produce a bricked device that a customer is going to be grumpy about.

josh