mtdblock caching and syncing
jwboyer at gmail.com
Thu Apr 9 13:16:46 EDT 2009
On Thu, Apr 09, 2009 at 12:02:47PM -0400, Doug Graham wrote:
>On Thu, Apr 09, 2009 at 10:51:00AM -0400, Josh Boyer wrote:
>> On Thu, Apr 09, 2009 at 10:15:56AM -0400, Doug Graham wrote:
>> >The problem is that a sync() or fsync() on an mtdblock device does not
>> >actually get the data all the way to the flash device. The mtdblock
>> >layer maintains its own cache of a single erase-unit (256KB in my case).
>> >If I open /dev/mtdblock0 for writing, write some stuff to it, then call
>> >fsync() but do not close the device, up to one erase-unit's worth of
>> >data may still be buffered in memory. This data is only flushed when
>> >the device is actually closed (by mtdblock_release). I think that
>> >this violates the intended semantics of sync and fsync. I shouldn't be
>> >required to do a close() to force the data to the device.
>> The device in question isn't the flash. It's the mtdblock device. So
>> fsync semantics are preserved. This is the same as writing to a file
>> on a hard drive, calling fsync, and having it sit in the hard drive's
>That's a good point, and one I've wondered about before. I don't know
>much about how hard drives manage their cache, but I would assume that
>they don't leave dirty data in their cache for an unbounded period
>of time. I'd guess that data is written to the actual disk within a
>few 10s of milliseconds after being sent to the device.
>In the case of mtdblock, dirty data can stay in the cache forever.
True. And I agree that is entirely sub-optimal.
>> >I think this is fairly serious bug in a flash-based system, where there
>> >are frequently times that you want to make sure that data has actually
>> >made it all the way to the device. I think that a sync() or fsync()
>> >really ought to somehow propagate all the way down to the mtdblock layer
>> >so that mtdblock can flush its buffer.
>> Why are you using mtdblock in a serious flash-based system? The fact
>> that it buffers an entire eraseblock means you risk huge data loss in
>> the event of an unclean shutdown anyway (power loss). No amount of
>> sync or fsync will fix that.
>We don't use mtdblock during normal operations; we use squashfs and jffs2
>(maybe ubifs sometime soon). But one job that we do use mtdblock for is
>burning loads. We could, and perhaps should, be using the char device
>instead to burn loads, except that those require specialized tools to do
>erases before writes. To avoid the need for such specialized tools, we
>just use the equivalent of dd on the mtdblock device followed by a sync.
>But that doesn't work given the behaviour I'm complaining about.
>It's actually a little more complicated that that. We have a system
>comprised of multiple cards. When upgrading the system from the master
>card, we're using NBD to upgrade (some) loads on remote cards. The NBD
>server running on the remote cards never closes the mtdblock device that
>it is managing, so the mtdblock_release() method never gets called.
>The NDB server cannot using the MTD character device because it knows
>nothing about the characteristics of flash, including the need to erase
>before writing. Even if it did know about erasing, we'd want it to do
>exactly the same kind of caching the mtdblock already does, so mtdblock
>does seem like a good match in this case. We can certainly modify the
>NBD server to close and reopen the device when it needs to be sure that
>data has actually been written to flash, but that seems a bit on the
>kludgy side, and doesn't help any other applications using mtdblock
>(like the dd scheme I mention above).
>> >Thoughts? Suggestions? Patches?
>> Word-weasling aside, if you have patches that fix the behavior you don't
>> like, they would certainly be looked at. Setting pdflush to 5 seconds
>> instead of 30 would help a bit, or using the ioctl on the mtdblock device
>> that already exists to flush would help too. However you might want to
>> really look at a system design that relies on mtdblock for data integrity.
>What's the point of mtdblock then? All systems care about data integrity
>to some degree (some more than others, obviously), so if mtdblock makes
>no effort to preserve that integrity, where do you see it ever being
For cramfs, which is read-only. Or in cases sort of like what you describe,
where the conditions of writes are tightly controlled and failure does not
produce a bricked device that a customer is going to be grumpy about.
More information about the linux-mtd