OneNAND: Rate of write errors

Thu Feb 22 11:35:35 EST 2007

Further thought about the numerous write errors to the OneNAND part
got me thinking about the symptoms, i.e., when we see the -EBADMSG
error return, there is no corresponding fault reported in the ECC
status register.  Consequently, we concluded that the bufferram may be
getting corrupted before the data is ever committed to the NAND array.

Hence, we rewrote the code for the setup as follows in the
onenand_write procedure:

        do
        {
            this->write_bufferram (mtd,
                                   ONENAND_DATARAM,
                                   wbuf,
                                   0,
                                   mtd->writesize);

            ret = onenand_do_check_bufferram (mtd,
                                              ONENAND_DATARAM,
                                              wbuf,
                                              0,
                                              mtd->writesize);

            if (ret != 0) // then
            {
                retrys = retrys + 1;

                printk (KERN_WARNING
                        "onenandwrite: bad buffer ram, retrying (%d)\n",
                        retrys);
            } // end if
        } while (ret != 0 &&
                 retrys < max_retrys);

        if (retrys >= max_retrys) // then
        {
            ret = -EBADMSG;

            break;
        } // end if

With max_retrys set to three (we have seen double attempts) to make
this work all the time.  There are no more errors reported back to the
JFFS2 system, and the file system cleanly mounts and unmounts.

This does verify the suspicion that the buffer was corrupted before it
was committed.  Does anyone have any idea how or why the data in the
bufferram might be corrupted?

Julianne C.