cfi_cmdset_0001.c: Excessive erase suspends

Jamie Lokier jamie at shareable.org
Fri Apr 18 12:35:36 EDT 2008


Alexey Korolev wrote:
> > "Newly-erased block contained word 0xffff0000 at offset 0x00180000"
> > on a board using Intel 28F640J5 flash chips.
> > 
> > It looks like the errors are caused by large amounts of erase suspends.
> > Each erase gets suspended around 8500 times and in some extreme cases
> > a lot more. The erase ends without any error bits set but it turns out
> > that it has failed.
> > 
> > It seems like some flash chips have a limit on the number of times that
> > the erase can be suspended. I have not seen any information on the Intel
> > chips but a Spansion AppNote says 5,980 times for some of their devices
> > before running the risk of an erase fail.

That's very interesting, thanks.

> We saw the similar  problem in our tests. As a possible solution I could
> suggest to disable erase suspend on write. 

That's quite bad for write latency, though.  Adding a suspend cycle
counter, and disabling suspend on write when it reaches a certain
number sounds better.

> Regarding limit of suspend/resume cycles: it is rather unclear how
> many cycles would be ok how many cycles would be not.  Special
> investigations are required here.

That's interesting too.

  - Do other chip docs say how many cycles are acceptable?
    Is there a count we can assume is safe for all devices of this
    type - like say 100?

  - Does the time spent in erase suspend matter?  E.g. if it was
    suspended for 1 minute due to lots of pending writes, restarted,
    and then suspended _again_ for 1 minute, etc. does that reduce the
    number of safe suspend-resume cycles due to the unstable
    partially-erased physical state?

  - Is it worth reading a block after erasing it, to verify that it's
    wiped - and mark blocks which have experienced >threshold suspend
    cycles as needing verification and re-erase, rather than meaning
    it's a bad block? ( Verification could be done lazily, on each part
    of the block just before writing. )  But is this good physically,
    or does too many suspends put the block into an unreliable state
    even if it does pass verification, so that it's important to limit
    the suspends rather than allow many and verify afterwards?

-- Jamie



More information about the linux-mtd mailing list