[PATCH RESEND] gpmi-nand: Handle ECC Errors in erased pages

Mon Apr 18 07:47:20 PDT 2016

Hi Boris,

On Fri, Apr 15, 2016 at 11:39:06AM +0200, Boris Brezillon wrote:
> Hi Markus,
> 
> On Fri, 15 Apr 2016 11:35:07 +0200
> Markus Pargmann <mpa at pengutronix.de> wrote:
> 
> > Hi Boris,
> > 
> > On Friday 15 April 2016 10:35:08 Boris Brezillon wrote:
> > > Hi Markus,
> > > 
> > > On Fri, 15 Apr 2016 09:55:45 +0200
> > > Markus Pargmann <mpa at pengutronix.de> wrote:
> > > 
> > > > On Wednesday 13 April 2016 00:51:55 Boris Brezillon wrote:
> > > > > On Tue, 12 Apr 2016 22:39:08 +0000
> > > > > Han Xu <han.xu at nxp.com> wrote:
> > > > > 
> > > > > > > Thanks for the feedback. Talking with a coworker about this we may have found a
> > > > > > > better approach to this that is less complicated to implement. The hardware
> > > > > > > unit allows us to set a bitflip threshold for erased pages. The ECC unit
> > > > > > > creates an ECC error only if the number of bitflips exceeds this threshold, but
> > > > > > > it does not correct these. So the idea is to change the patch so that we set
> > > > > > > pages, that are signaled by the ECC as erased, to 0xff completely without
> > > > > > > checking. So the ECC will do all the work and we completely trust in its
> > > > > > > abilities to do it correctly.
> > > > > > 
> > > > > > Sounds good.
> > > > > > 
> > > > > > some new platforms with new gpmi controller could check the count of 0 bits in page,
> > > > > > refer to my patch https://patchwork.ozlabs.org/patch/587124/
> > > > > > 
> > > > > > But for all legacy platforms, IMO, considering bitflip is rare case, set threshold to 0 and
> > > > > > only check the uncorrectable branch and then correct data sounds better. Setting threshold
> > > > > > and correcting all erased page may highly impact the performance.
> > > > > 
> > > > > Indeed, bitflips in erased pages is not so common, and penalizing the
> > > > > likely case (erased pages without any bitflips) doesn't look like a good
> > > > > idea in the end.
> > > > 
> > > > Are erased pages really read that often?
> > > 
> > > Yes, it's not unusual to have those "empty pages?" checks (added Artem
> > > and Richard to get a confirmation). AFAIR, UBIFS check for empty pages
> > > in its journal heads after an unclean unmount (which happens quite
> > > often) to make sure there's no corruption.
> > > 
> > > > I am not sure how UBI handles
> > > > this, does it read every page before writing?
> > > 
> > > Nope, or maybe it does when you activate some extra checks.
> > > 
> > > > 
> > > > > 
> > > > > You can still implement this check in software. You can have a look at
> > > > > nand_check_erased_ecc_chunk() [1] if you need an example, but you'll
> > > > > have to adapt it because your controller does not guarantees that ECC
> > > > > bits for a given chunk are byte aligned :-/
> > > > 
> > > > Yes I used this function in the patch. The issue is that I am not quite
> > > > sure yet where to find the raw ECC data (without rereading the page).
> > > > The reference manual is not extremely clear about that, ecc data may be
> > > > in the 'auxilliary data' but I am not sure that it really is available
> > > > somewhere.
> > > 
> > > AFAIR (and I'm not sure since it was a long time ago), you don't have
> > > direct access to ECC bytes with the GPMI engine. If that's the case,
> > > you'll have to read the ECC bytes manually (moving the page pointer
> > > using ->cmdfunc(NAND_CMD_RNDOUT, column, -1)), which is a pain with
> > > this engine, because ECC bytes are not guaranteed to be byte aligned
> > > (see gpmi ->read_page_raw() implementation).
> > > Once you've retrieved ECC bytes (or bits in this case), for each ECC
> > > chunk, you can use the nand_check_erased_ecc_chunk() function (just make
> > > sure you're padding the last ECC byte of each chunk with ones so that
> > > bitflips cannot be reported on this section).
> > 
> > Thanks for the information. So I understand that this approach is the
> > preferred one to avoid any performance issues for normal operation.
> > 
> > I actually won't be able to fix this patch accordingly for some time. If
> > anyone else needs this earlier, feel free to implement it.
> 
> I just did [1] (it applies on top of your patch), but maybe you
> can test it (I don't have any imx platforms right now) ;).
> 
> If these changes work, feel free to squash them into your previous
> patch.

I've tested your diff onto Markus Pargmann's patch. It looks promising.

However I've noticed that the calculation of the ECC parity bits position is
wrong.  It doesn't consider the extra metadata bytes at the beginning of the
raw page and that the ECC parity bits are at the end of the ECC chunk. My test
platform is the i.MX6 with two NAND flashes

    nand: Samsung NAND 1GiB 3,3V 8-bit
    nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
    (-> 104 Bits ECC )

and
 
    nand: AMD/Spansion S34ML08G2
    nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 128
    (-> 234 Bits ECC )

I've also tested the bit alignment code. It works correctly for the Spansion
NAND, as the 234 Bits of ECC are 29.25 Bytes on the NAND flash. So there the
parity bits are not byte aligned.

Mit freundlichen Grüßen / Kind regards,
        Stefan Christ

The corrected ECC parity bit code is:

---->8----

diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
index 2f16d7f..ccae6e6 100644
--- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
+++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
@@ -1054,7 +1054,9 @@ static int gpmi_ecc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
 			int flips;
 
 			/* Read ECC bytes into our internal raw_buffer */
-			offset = ((8 * nfc_geo->ecc_chunk_size) + eccbits) * i;
+			offset = nfc_geo->metadata_size * 8;
+			offset += ((8 * nfc_geo->ecc_chunk_size) + eccbits) * (i + 1);
+			offset -= eccbits;
 			bitoffset = offset % 8;
 			eccbytes = DIV_ROUND_UP(offset + eccbits, 8);
 			offset /= 8;