howto mark blocks with correctable bitflip errors as bad blocks explicitly

Richard Weinberger richard.weinberger at gmail.com
Tue May 23 02:29:42 PDT 2017


Andreas,

On Tue, May 23, 2017 at 11:18 AM, Andreas Fenkart <afenkart at gmail.com> wrote:
> Hi list,
>
> Is there a way to mark a block as bad explicitly with one of the
> nandwrite/nandtest tools?

Not explicitly. AFAIK nandtest marks a block as bad if erasure fails.
But you can create a tiny c program and use the MEMSETBADBLOCK
ioctl().

You should set a block as bad only as last resort and when you are really
sure that the block is bad.

> I had a device that reports bitflip errors, and I had the suspicion that
> the u-boot is not correcting those errors, hence it fails to verify the CRC
> checksum in the uimage header.
>
> $ nanddump --bb=skipbad --omitoob -l ${FSIZE} ${PART} -f $TMPFILE
> ECC failed: 0
> ECC corrected: 5
> Number of bad blocks: 0
> Number of bbt blocks: 0
> ECC: 1 corrected bitflip(s) at offset 0x003da000

A single bitflip should be no problem for any ECC engine. :)

> The problem happened on a remote system and I don't have access to the
> console. I made a custom tool to mark that single block as bad and now,
> u-boot is booting the new kernel. Hence it seems u-boot has is not able to
> recover the same errors as the kernel. Probably it uses a weaker ECC
> algorithm. (need to verify check this)

Please double check. Marking blocks as bad is not a good solution.

> Since there are other installations, I would like to automate marking
> bitflips as errors too. Does this make sense?  I don't want to update u-boot
> and using a stronger ECC still makes sense for the rootfs that u-boot is not
> accessing.
>
> Is there already a tool that can do that automatically?  What would be the best
> place to add such a function?

Please no. Correctable bitflips are perfectly fine and can happen all the time.
If you're using UBI it will test blocks and mark them as bad if they really turn
out to be bad.

> I was looking into nandtest and wanted to add some '--zero-error' flag to mark
> buffers with recovered ECC errors as bad forcefully. Unfortunately nandtest
> does not report those bitflip errors at all.
>
> Everytime nandump reports a bitflip error the 'ECC failed' counter increases on
> the next run. But when running nandtest it never does. Maybe it's because
> nanddump reads it min_io_size chunks while nandtest reads back the whole erase
> block.

What NAND driver is that? To me it seems like your ECC setup has problems.

-- 
Thanks,
//richard



More information about the linux-mtd mailing list