[PATCH 0/3] MTD: Change meaning of -EUCLEAN return code on reads

Shmulik Ladkani shmulik.ladkani at gmail.com
Fri Mar 16 17:54:24 EDT 2012


Hi Mike,

On Fri, 16 Mar 2012 09:25:08 -0700 Mike Dunn <mikedunn at newsguy.com> wrote:
> Maybe my (admittedly limited) understanding of the physical nature of NAND flash
> is flawed.  I assumed that a writesize region (i.e., a NAND page for our
> purposes) is the most elemental unit wrt physical wear, regardless of whether or
> not ecc is caclulated once for the whole page or incrementally in steps.

Bit-flips may occur at a per-cell basis, even on the OOB cells, as a
result of program-disturb, charge-loss, or cell ware-out causing read
sensing errors.

> But you're sayimg my assumption is incorrect.  So each ecc-sized area within a
> page is physically distinct and must be considered in isolation? 

There's no "physical" distinction, in the sense that cells are separated
in the device or alike.
Simply, the ECC algorithm is independently calculated over several
portions of the page.
But that's not a must: suppose X bits per Y bytes ECC is required; you
may use a 2X / 2Y ECC and acheive similar intergrity and endurance
statistical characteristics.

For your purposes, the question whether the cleaning decision should be
according to the ecc step level, is dependent of how you define
"a dangerously high number of bit errors".

Lets continue with Ivan's example (2KiB page, 4 eccsteps, 512 bytes
each step, strength 4bits/512bytes).
Suppose the first ECC portion has 4 bit errors, the other 3 portions
have none.
If, for example, several read operations later, a new bitflip is
intorduced within the first portion, leading to 5 bit errors.
Obviously, the ECC algorithm is now unable to correct this portion,
meaning, the buffer is corrupt - which also means the entire page data
read is corrupt. The nand infrastructure would return -EBADMSG - and you
had just 5 bit errors over the entire page.

So question is, would you consider 4 bit errors in the first ECC portion
to be "a dangerously high number of bit errors" as what's reported to
the MTD users?
If so, then yes, the cleaning decision should be according to the ecc
step level, not at the page reading level.

Regards,
Shmulik



More information about the linux-mtd mailing list