[BUG] Nand support broken with v2.6.36-rc1

Tue Aug 17 13:00:39 EDT 2010

Hello,

On 08/17/2010 01:52 AM, Michael Guntsche wrote:
 > The only thing that might be special with the nand driver that is being
 > used is that a different oob layout is being used.
 >
 > static struct nand_ecclayout rbppc_nand_oob_16 = {
 >    .eccbytes = 6,
 >    .eccpos = { 8, 9, 10, 13, 14, 15 },
 >    .oobavail = 9,
 >    .oobfree = { { 0, 4 }, { 6, 2 }, { 11, 2 }, { 4, 1 } }
 > };

On 08/17/2010 04:36 AM, Michael Guntsche wrote:
> I added this to the nand driver itself.
>
> static uint8_t scan_ff_pattern[] = { 0xff, 0xff };
> static struct nand_bbt_descr rbppc_nand_smallpage = {
>    .options = NAND_BBT_SCAN2NDPAGE,
>    .offs = NAND_SMALL_BADBLOCK_POS,
>    .len = 1,
>    .pattern = scan_ff_pattern
> };
>
> and the driver is working again. But shouldn't this be supported by the stock level code as well?

Why yes, it should! Somebody (probably me) goofed. Your nand_ecclayout 
is conflicting with the kernel's choice of bad block position. Recent 
changes must have affected which position is chosen automatically by the 
kernel.

One of the following two cases is likely the problem:
(1) Your chip is supposed to use offset 0, not 5, for the BBM (i.e., 
NAND_LARGE_BADBLOCK_POS, not NAND_SMALL_BADBLOCK_POS), and so your 
ecclayout should not be leaving byte 0 in the "oobfree" array (a design 
flaw since you first began using this chip)
(2) I made the commit that you mentioned 
(c7b28e25cb9beb943aead770ff14551b55fa8c79) too restrictive in allowing 
chips to use NAND_SMALL_BADBLOCK_POS.

Option 2 is likely the case, and in fact, I realized a stupid mistake I 
made in refactoring the detection here.

I have been studying data from hundreds of flash chips to find where the 
factory-determined markers should be stored. Unfortunately, I can't 
cover all of them, and so your Hynix chip is likely one that was 
overlooked. Could you send the full NAND ID string (8 bytes, not just 
the manufacturer and chip ID), an exact part number for the flash, and a 
datasheet? Any one of those could help (the datasheet being the most 
important), but whatever you can provide is helpful. More data on your 
chip would allow me to determine the problem for sure; I will send a 
patch ASAP once I get your information.

Sorry for the trouble!

On another note, it may be intelligent to have the kernel-specific 
systems check for such a conflict between bad-block markers and ECC 
layout. If a position needed by the bad-block marker is listed in 
"oobfree" or "eccpos" then we have a problem. Sound like a good idea 
anybody? If so, what would be the best approach:
* print an error and quit detection
* try to modify the ecclayout, bbm info or both
* try to modify, and fall-back to error message and quit if necessary

Thanks,
Brian