MMC and reliable write - was: since when does ARM map the kernel memory in sections?

Andrei Warkentin andreiw at motorola.com
Tue Apr 26 21:13:48 EDT 2011


Hi Jamie,

On Tue, Apr 26, 2011 at 5:45 PM, Jamie Lokier <jamie at shareable.org> wrote:
> Peter Waechtler wrote:
>> JEDEC Standard No. 84-A441
>> Page 56
>>
>>
>> Reliable Write: Multiple block write with pre-defined block count and
>> Reliable Write parameters. This transaction is similar to the basic pre-
>> defined multiple-block write (defined in previous bullet) with the
>> following exceptions. The old data pointed to by a logical address must remain
>> unchanged until the new data written to same logical address has been
>> successfully programmed. This is to ensure that the target address
>> updated by the reliable write transaction never contains undefined data.
>>
>> Data must remain valid even if a sudden power loss occurs during the
>> programming.
>>
>> There are two versions of reliable write: legacy implementation and the
>> enhance implementation. The type of reliable write supported by the device is
>> indicated by the EN_REL_WR bit in the
>> WR_REL_PARAM extended CSD register.
>>  For the case of EN_REL_WR = 0 :
>>
>>
>> More fun on page 147ff:
>>
>> • WR_REL_SET [167]
>> The write reliability settings register indicates the reliability setting for
>> each of the user and general
>> area partitions in the device. The contents of this register are read only if
>> the HS_CTRL_REL is 0 in
>> the WR_REL_PARAM extended CSD register. The default value of these bits is not
>> specified and is
>> determined by the device.
>>
>>
>> it goes on with:
>>
>> Bit[4]: WR_DATA_REL_4
>> 0x0: In general purpose partition 4, the write operation has been optimized
>> for performance and existing data in the partition could be at risk if a power
>> failure occurs.
>>
>> 0x1: In general purpose partition 4, the device protects previously written
>> data if power failure occurs during a write operation.
>
> Hmm...  It all hinges on whether "previously written data" refers just
> to the region being overwritten, or to all the other data in the
> partition?
>
> If MMC writes are specified to only affect the data being written with
> a Write command, and to have stably committed the data when Write
> returns, then "Reliable Write" just means "atomic", and filesystems
> and databases don't actually need that.
>
> Hard disks don't guarantee that, and it's not a problem.  Filesystems
> and databases need barriers and/or durable (stable) commits, and for
> writes in one area not to corrupt data in a different area.
>
> *That's* a problem with other flash devices (and possibly some RAIDs):
> Writes to one area can corrupt data in sectors that aren't being
> written to, over quite a large distance.
>
> I can't tell from the above specification excerpt (by itself) what is
> being guaranteed; it seems ambiguous, but maybe there's a clearer
> definition elsewhere.
>
> It is conceivable that checksums and metadata could be stored into a
> "reliable" partition and some kinds of file data into an "unreliable"
> partition, where filesystem integrity is important and nobody cares
> about the actual data! :-)
>
> -- Jamie
>

I think this basically says - don't end up with corrupt flash if I
pull the power when doing this MMC transaction.
If you pull power during a regular write, you could end up with ALL
erase units affected being wiped.

Note, that the new definition of reliable writes provides a guarantee
to a sector boundary. So if you interrupt
the transaction, you will end up with [new data] followed by [old
data]. The old definition guaranteed the entire range,
but the transaction was only reliable when done over a sector or erase unit.

This means I jumped the gun on implementing REQ_FUA as reliable write,
as REQ_FUA says nothing about atomicity.
OTOH, I don't think anything in the block layer expects massive data
corruption on power loss. In my defence, I saw REQ_FUA
as being "prevent data corruption during power loss", hence the
reliable write via REQ_FUA in mmc layer.

So my question -
a) how should reliable writes be handled? REQ_META?
b) how do we make sure to not wind up with data corruption and MMCs
for work loads where you know power can be removed at any moment?

We could always turn on reliable writes (not good perf wise). We could
turn on reliable writes for a particular range (enhanced user
partition). We could also turn on reliable writes for a specific
hardware partition. We could even create mapping layer that will
occasionally atomically flush data to flash, while the actual fs
accesses go to RAM.

A



More information about the linux-arm-kernel mailing list