howto mark blocks with correctable bitflip errors as bad blocks explicitly

Tue May 23 02:18:23 PDT 2017

Hi list,

Is there a way to mark a block as bad explicitly with one of the
nandwrite/nandtest tools?

I had a device that reports bitflip errors, and I had the suspicion that
the u-boot is not correcting those errors, hence it fails to verify the CRC
checksum in the uimage header.

$ nanddump --bb=skipbad --omitoob -l ${FSIZE} ${PART} -f $TMPFILE
ECC failed: 0
ECC corrected: 5
Number of bad blocks: 0
Number of bbt blocks: 0
ECC: 1 corrected bitflip(s) at offset 0x003da000

The problem happened on a remote system and I don't have access to the
console. I made a custom tool to mark that single block as bad and now,
u-boot is booting the new kernel. Hence it seems u-boot has is not able to
recover the same errors as the kernel. Probably it uses a weaker ECC
algorithm. (need to verify check this)

Since there are other installations, I would like to automate marking
bitflips as errors too. Does this make sense?  I don't want to update u-boot
and using a stronger ECC still makes sense for the rootfs that u-boot is not
accessing.

Is there already a tool that can do that automatically?  What would be the best
place to add such a function?

I was looking into nandtest and wanted to add some '--zero-error' flag to mark
buffers with recovered ECC errors as bad forcefully. Unfortunately nandtest
does not report those bitflip errors at all.

Everytime nandump reports a bitflip error the 'ECC failed' counter increases on
the next run. But when running nandtest it never does. Maybe it's because
nanddump reads it min_io_size chunks while nandtest reads back the whole erase
block.

/Andi