UBIFS and hardware ECC of all FF pages of MLC NAND

Tue Sep 29 12:13:55 EDT 2009

> -----Original Message-----
> From: Artem Bityutskiy [mailto:dedekind at infradead.org] 
> Sent: Tuesday, September 29, 2009 8:42 AM
> To: Darwin Rambo
> Cc: Matthieu CASTET; linux-mtd at lists.infradead.org; Adrian Hunter
> Subject: RE: UBIFS and hardware ECC of all FF pages of MLC NAND
> 
> On Tue, 2009-09-29 at 06:26 -0700, Darwin Rambo wrote:
> > Artem,
> > 
> > One thing you might add is a paranoid check for the OOB 
> being set to 0xFF before 
> > programming a page. If someone programs trailing pages in a 
> block of 0xFF by mistake, 
> > and puts a non-0xFF ECC in the OOB, then the UBIFS code 
> would write to an already 
> > written ECC, which I have found to corrupt other blocks 
> ECCs on my part. It also gives 
> > strange error messages and refuses to mount on reboot. The 
> messages do not look like 
> > they are related to the original ECC write problem so it is 
> harder to debug.
> 
> Do you mean extending the 'ubi_dbg_check_all_ff()' check and make it
> also read OOB to make sure there are only 0xFF bytes? Well, 
> it might be
> useful, but I would prefer to get a patch from someone, rather than
> implementing this myself. :-)

That's what I meant. I am not very patch-aware but will consider trying.

> 
> > With this particular error, you can see messages like below:
> > 
> > UBIFS error (pid 245): ubifs_read_node: bad node type (255 
> but expected 2)
> > UBIFS error (pid 245): ubifs_read_node: bad node at LEB 73:456392
> > UBI error: ubi_io_read: error -74 while reading 64 bytes 
> from PEB 3:0, read 64 bytes
> 
> Well, here I already see that the problem is on driver level because I
> cannot read data. Also, if your driver prints an error message in case
> of an uncorrectable ECC errors, this could help.

That's probably the easiest solution.

> 
> > UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs 
> for bad PEB handling,
> >  reserved 17, need 19
> > UBI warning: ubi_eba_copy_leb: error -74 while reading data 
> from PEB 3
> > UBI error: wear_leveling_worker: error -74 while moving PEB 
> 3 to PEB 2
> > UBI warning: ubi_ro_mode: switch to read-only mode
> > UBI error: do_work: work failed with error code -74
> > UBI error: ubi_thread: ubi_bgt0d: work failed with error code -74
> > UBI error: ubi_io_read: error -74 while reading 516096 
> bytes from PEB 3:8192, re
> > ad 516096 bytes
> > UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 1:8192
> > UBIFS error (pid 1): ubifs_scanned_corruption: corrupted 
> data at LEB 1:8192
> > UBIFS error (pid 1): ubifs_scan: LEB 1 scanning failed
> > UBI error: ubi_io_read: error -74 while reading 516096 
> bytes from PEB 3:8192, read 516096 bytes
> > UBIFS error (pid 1): ubifs_recover_master_node: failed to 
> recover master node
> 
> ... snip ...
> 
> > A better error message would say something like:
> > "UBI error: Data page incorrectly programmed to all 0xFFs 
> with non-0xFF ECC."
> 
> Probably, but that would happen only if you have debugging checks
> enabled, right?

Right.

> 
> > Another suggestion is rather than creating large files 
> stuffed with 0xFF pads the 
> > end of some of the blocks, to have a ubinize option which 
> creates a download header 
> > in front of each block with block length and valid data 
> length. Then the 0xFF's 
> > wouldn't have to be carried around and the user would be 
> less likely to program 
> > 0xFF's by mistake. They would typically only program the 
> useful data that is in 
> > the file instead, and since they erased the block to 
> program, the trailing 0xFFs
> > would be taken care of automatically. Of course, this would 
> require custom flasher
> > changes to accommodate. Thanks.
> 
> It is doable, but I can predict then other people will 
> complain why the
> hack they cannot use simple nandwrite when flashing UBI 
> images. And for
> many people who have HW which has no problems with writing 
> 0xFFs - plane
> nandwrite is usable.

This is for an embedded system in which we serial download
initially, and then upgrade block by block over the network via ethernet
or wireless, so we don't use nandwrite at this time. I wasn't suggesting 
changing the default behaviour of ubinize, just adding a 
switch for embedded types and also to avoid accidental programming
of these regions. However, if it's too confusing, then  it may not be worth it.

> 
> But how much 0xFFs are there are? There should not be that 
> many. We pad
> special areas like the UBIFS log, the UBI volume table, the 
> UBIFS lprops
> area, the UBIFS master area with 0xFF, but that is it. Your _data_,
> i.e., the FS contents is not "stuffed with 0xFFs", it is only those
> special UBIFS areas.

Yes it is only special UBIFS areas.

It is a bigger problem with 512K erase blocks. In this case, my
6MB jffs2 image grows to over 14MB ubifs image due to padding. There are about 12 
partial blocks with little data in the first few pages, and about 4 partial 
blocks at the end. 16 partial blocks is about 8 MB of overhead on 6MB of
real content. 

> 
> So, does it really worth doing what you have suggested? 
> Skipping 0xFFed
> works just fine. Will the images be really much smaller?

See above. Thanks.

Darwin

> 
> -- 
> Best Regards,
> Artem Bityutskiy (Артём Битюцкий)
> 
> 
>