Does modern UBI/UBIFS still suffer from the 'unstable bits issue'?

Fri Mar 2 02:07:10 PST 2018

Tim,

Am Freitag, 2. März 2018, 02:19:54 CET schrieb Tim Harvey:
> On Thu, Mar 1, 2018 at 8:32 AM, Richard Weinberger <richard at nod.at> wrote:
> > Tim,
> > 
> > Am Donnerstag, 1. März 2018, 17:15:44 CET schrieb Tim Harvey:
> >> Greetings,
> >> 
> >> I have a user with an IMX6 and raw NAND using UBI/UBIFS who has been
> > 
> >> able to reproduce a NAND corruption:
> > What does your user to reproduce this?
> 
> Richard,
> 
> It's unclear at the moment. It's one of those 'this happened twice on
> two different boards' reports without a lot of detail. However I do
> know they do write to the filesystem on every boot and do encounter
> random power-cuts.
> 
> >> [   10.611972] UBIFS (ubi0:2): background thread "ubifs_bgt0_2" started,
> >> PID 631 [   10.634365] ubi0 warning: ubi_io_read: error -74 (ECC error)
> >> while reading 253952 bytes from PEB 2807:8192, read only 253952 bytes,
> >> retry [ 10.657492] ubi0 warning: ubi_io_read: error -74 (ECC error)
> >> while reading 253952 bytes from PEB 2807:8192, read only 253952 bytes,
> >> retry [
> >> 10.681137] ubi0 warning: ubi_io_read: error -74 (ECC error) while reading
> >> 253952 bytes from PEB 2807:8192, read only 253952 bytes, retry [
> >> 10.704267] ubi0 error: ubi_io_read: error -74 (ECC error) while reading
> >> 253952 bytes from PEB 2807:8192, read 253952 bytes

BTW: I miss a back trace here. How did you obtain that messages?

> >> The kernel they are using is a bit out of date but does have
> >> 'gpmi-nand: Handle ECC Errors in erased pages' [1] patch
> >> 
> >> I'm wondering if the 'unstable bits issue' [2] is still an issue or if
> >> the UBI/UBFS Documentation is out of date and this has been resolved.
> >> If it has been resolved, can anyone point me to the patches.
> > 
> > This issue is highly theoretical and I never actually saw it in the wild.
> > Every single time someone claimed to suffer from that, it turned out to be
> > something else. Currently UBI/UBIFS has no counter measurement, for the
> > said reasons.
> > This reminds me that we have to update the website...
> > 
> > So did you verify (with your NAND vendor) that this really is the named
> > issue?
> I have no idea if what the user reported is the unstable bits issue
> but the fact you've never seen it occur in the wild tells me probably
> not.

I'd be surprised, but you never know. :-)

Just to be sure, this is SLC NAND, right?

> They are using a rather old kernel (4.4 but with a patch to gpmi-nand
> backported from 4.7). I will setup a controlled test with random
> power-cuts in a test fixture I have to see if I can get it to re-occur
> on a) the old kernel and then b) the current kernel.

Thanks,
//richard