howto mark blocks with correctable bitflip errors as bad blocks explicitly
Andreas Fenkart
afenkart at gmail.com
Tue May 23 02:18:23 PDT 2017
Hi list,
Is there a way to mark a block as bad explicitly with one of the
nandwrite/nandtest tools?
I had a device that reports bitflip errors, and I had the suspicion that
the u-boot is not correcting those errors, hence it fails to verify the CRC
checksum in the uimage header.
$ nanddump --bb=skipbad --omitoob -l ${FSIZE} ${PART} -f $TMPFILE
ECC failed: 0
ECC corrected: 5
Number of bad blocks: 0
Number of bbt blocks: 0
ECC: 1 corrected bitflip(s) at offset 0x003da000
The problem happened on a remote system and I don't have access to the
console. I made a custom tool to mark that single block as bad and now,
u-boot is booting the new kernel. Hence it seems u-boot has is not able to
recover the same errors as the kernel. Probably it uses a weaker ECC
algorithm. (need to verify check this)
Since there are other installations, I would like to automate marking
bitflips as errors too. Does this make sense? I don't want to update u-boot
and using a stronger ECC still makes sense for the rootfs that u-boot is not
accessing.
Is there already a tool that can do that automatically? What would be the best
place to add such a function?
I was looking into nandtest and wanted to add some '--zero-error' flag to mark
buffers with recovered ECC errors as bad forcefully. Unfortunately nandtest
does not report those bitflip errors at all.
Everytime nandump reports a bitflip error the 'ECC failed' counter increases on
the next run. But when running nandtest it never does. Maybe it's because
nanddump reads it min_io_size chunks while nandtest reads back the whole erase
block.
/Andi
More information about the linux-mtd
mailing list