[PATCH 0/6] mtd: Fix unnecessary flash erase and write errors

Paul Parsons lost.distance at yahoo.com
Wed Mar 7 09:09:44 EST 2012


The problem:

An HP iPAQ hx4700 consistently reports flash erase and write errors when running
Linux:

block erase failed at 0x02c00000: status 0xa000a0. Retrying...
block erase failed at 0x02c00000: status 0xa000a0. Retrying...
block erase failed at 0x02c00000: status 0xa000a0. Retrying...
SR.4 or SR.5 bits set in buffer write (status a000a0). Clearing.

block erase failed at 0x02c40000: status 0xa000a0. Retrying...
block erase failed at 0x02c40000: status 0xa000a0. Retrying...
block erase failed at 0x02c40000: status 0xa000a0. Retrying...
physmap-flash: block erase failed at 0x02c40000 (status 0xa000a0)

block erase failed at 0x00440000: status 0xa000a0. Retrying...
block erase failed at 0x00440000: status 0xa000a0. Retrying...
block erase failed at 0x00440000: status 0xa000a0. Retrying...
physmap-flash: block erase error: (bad VPP)
physmap-flash: buffer write error (status 0xd000d0)

The cause:

The flash program/erase voltage (vpp) is turned off by the MTD CFI driver while
erase operations are in progress or suspended. This kills the erase operations.

The culprit:

./drivers/mtd/chips/cfi_cmdset_0001.c contains two functions, get_chip() and
put_chip(), which are called before and after every flash operation.
The intention seems to be that if one thread is waiting for a Block Erase to
finish, another thread wanting to perform a concurrent operation within the same
partition will suspend the erase in get_chip() and resume it in put_chip(),
without disturbing vpp.
If the other thread wants to perform a concurrent operation within a different
partition, no suspend/resume is performed but put_chip() will call DISABLE_VPP()
to turn vpp off.
That is exactly what happens: a pending Block Erase in one partition expects vpp
to remain on, but a Read Array in another partition turns vpp off.

The fix:

1. Ensure that only those flash operations which call ENABLE_VPP() can then call
DISABLE_VPP(). Other operations should never call DISABLE_VPP().

Consequently...

2. Ensure that calls to ENABLE_VPP() / DISABLE_VPP() (i.e. set_vpp()) can nest.
This requirement is already stated in ./include/linux/mtd/map.h:

        /* set_vpp() must handle being reentered -- enable, enable, disable
           must leave it enabled. */
        void (*set_vpp)(struct map_info *, int);

and a method for doing so is suggested in ./drivers/mtd/chips/cfi_cmdset_0001.c:

        /* We should really make set_vpp() count, rather than doing this */
        DISABLE_VPP(map);

But only 1 of the 5 MTD map drivers which provide a set_vpp() implementation
includes a reference counter.

The patch set:

[PATCH 1/6] mtd: chips: cfi_cmdset_0001: Match ENABLE_VPP()/DISABLE_VPP() calls
[PATCH 2/6] mtd: chips: cfi_cmdset_0002: Match ENABLE_VPP()/DISABLE_VPP() calls
[PATCH 3/6] mtd: maps: physmap: Add reference counter to set_vpp()
[PATCH 4/6] mtd: maps: l440gx: Add reference counter to set_vpp()
[PATCH 5/6] mtd: maps: pcmciamtd: Add reference counter to set_vpp()
[PATCH 6/6] mtd: maps: sa1100-flash: Add reference counter to set_vpp()

Patches 1 and 3 completely eliminate all erase and write errors from the hx4700.

Patches 2, 4, 5 and 6 apply the same fix to other permutations of CFI command
set and MTD map driver. A 5th MTD map driver - dilnetpc - already includes a
set_vpp() reference counter.



More information about the linux-mtd mailing list