pxa3xx_nand issues

Thu Sep 23 07:32:26 EDT 2010

On Thursday 23 September 2010 08:05:56 Eric Miao wrote:
> On Thu, Sep 23, 2010 at 1:12 AM, pieterg <pieterg at gmx.com> wrote:
> > In my search for the cause of the huge number of single/double bit
> > errors I'm experiencing on colibri pxa320/310 devices, I've come across
> > this commit
> >
> > 
http://git.kernel.org/?p=linux/kernel/git/ycmiao/pxa-linux-2.6.git;a=commit;h=7f9938d0fd6c778bd0ce296a3e3b50266de2b892
> >
> > According to the commitlog, it attempts to work around an issue
> > regarding non-page-aligned reads.
> > The workaround seems to force page-aligned access, by dropping the
> > offset within the page (column address bytes).
> > However, in my setup (with a jffs2 filesystem on nand),
> > non-page-aligned reads never occur, but non-page-aligned writes occur
> > very frequently. (during the jffs2 gc).
> > These are also affected by this commit, while the commitlog does not
> > state whether or not the same issue would occur for the program
> > command, and in that case, whether or not the same workaround would
> > apply.
> >
> > I've tried to revert the commit, but unfortunately this doesn't reduce
> > the huge number of single/double bit errors (and jffs2 crc errors as a
> > result) I'm getting.
> >
> > But having these non-aligned writes during GC, would that indicate a
> > problem with my jffs2 image parameters perhaps?
> > (though I cannot imagine this could actually cause double bit errors)
>
> It might not be related to the commit above.  The NAND controller will
> always read the whole page and ignoring the column address, that patch
> tries to make less confusion. The offset is actually handled completely
> by software (memorized).

I can see how the read offset works, but I do not quite see how this would 
work for writes (which call the same prepare_read_prog_cmd, and have their 
column address stripped as well).
Found out that this happens when writing oob data by the way; these are 
writes with offset 2048 within the page. Jffs2 does this when writing 
cleanmarkers.

However, I'm also convinced that this is probably unrelated to my problems.
In fact, the problem always occurs on the same pages.
I could identify about 10 eraseblocks with pages which produce single/double 
bit errors.
After I marked them bad (manually), I've seen no more bit errors, and the 
jffs2 rootfs has remained perfectly healthy.

Apparently a double bit error is not a reason to consider a block bad; jffs2 
does not mark a block bad untill it failed to be erased more than 2 times.
But it seems the nand controller (or at least the pxa3xx_nand driver) 
doesn't report any problems when erasing these blocks. (I will further 
investigate this)

I would happily blame this on the NAND which might be bad, if this were just 
a single board instead of all colibri pxa320/310 boards I've tried so far, 
more than 5 in total.

Rgds, Pieter