UBIFS question

Martin Townsend mtownsend1973 at gmail.com
Thu Mar 17 05:54:43 PDT 2016


Hi Ricard, Richard

On Thu, Mar 17, 2016 at 11:43 AM, Ricard Wanderlof
<ricard.wanderlof at axis.com> wrote:
>
>> > We expect the flash devices to start failing quicker than normally
>> > expected due to the environment in which they will be operating in, so
>> > sudden NAND blocks turning bad will eventually happen and what we
>> > would like to do is try and capture this as soon as possible.
>> > The boards are not accessible as they will be located in very remote
>> > locations so detecting these failures before the system locks up would
>> > be an advantage so we can report home with the information and fail
>> > over to the other filesystem (providing that hasn't also been
>> > corrupted).
>>
>> Dealing with sudden bad NAND blocks is almost impossible.
>> Unless you have a copy of each block.
>> NAND is not expected to gain bad blocks without an indication like
>> correctable bitflips.

I'm not interested in dealing with sudden bad NAND blocks, I accept
this will more than likely happen at some point but what I am
interested in is early detection.  Once the system has booted most
files will be cached to memory and the product that the flash devices
are in is designed to run for many months without being power cycled
so what I'm looking to do is monitor the health of the flash devices.
Ideally I would like to know FEC counts but I doubt I will get this
information :) But checking LEBs, pages etc for bad checksums would be
great.

>
> Yes, although the NAND flash documentation sometimes reads like blocks can
> suddenly 'go bad' for no special reason, in practice it is due to
> excessive erase/write cycles, i.e. its a wear problem.
>
> However, I don't know, if you are operating the flash in an environment
> where there is cosmic radiation that can actually damage the chip for
> instance, then of course any part of the chip could fail randomly with a
> fairly high probability. But NAND bad block management is not designed to
> take care of that case, which is why bad block detection is only done
> during block erasure (i.e. when a block fails to erase).
>
I'm not sure how much I can say I'm afraid as I'm under NDA but assume
that it is going to be operating in an environment where it's
receiving more cosmic radiation than expected. So I could look at the
bad block detection code to get some ideas?  I don't necessary want to
mark blocks as bad I just want to detect them so I have an idea that
the flash is failing.

Many Thanks,
Martin.

> /Ricard
> --
> Ricard Wolf Wanderlöf                           ricardw(at)axis.com
> Axis Communications AB, Lund, Sweden            www.axis.com
> Phone +46 46 272 2016                           Fax +46 46 13 61 30



More information about the linux-mtd mailing list