UBIFS and hardware ECC of all FF pages of MLC NAND

Artem Bityutskiy dedekind at infradead.org
Tue Sep 29 11:42:16 EDT 2009


On Tue, 2009-09-29 at 06:26 -0700, Darwin Rambo wrote:
> Artem,
> 
> One thing you might add is a paranoid check for the OOB being set to 0xFF before 
> programming a page. If someone programs trailing pages in a block of 0xFF by mistake, 
> and puts a non-0xFF ECC in the OOB, then the UBIFS code would write to an already 
> written ECC, which I have found to corrupt other blocks ECCs on my part. It also gives 
> strange error messages and refuses to mount on reboot. The messages do not look like 
> they are related to the original ECC write problem so it is harder to debug.

Do you mean extending the 'ubi_dbg_check_all_ff()' check and make it
also read OOB to make sure there are only 0xFF bytes? Well, it might be
useful, but I would prefer to get a patch from someone, rather than
implementing this myself. :-)

> With this particular error, you can see messages like below:
> 
> UBIFS error (pid 245): ubifs_read_node: bad node type (255 but expected 2)
> UBIFS error (pid 245): ubifs_read_node: bad node at LEB 73:456392
> UBI error: ubi_io_read: error -74 while reading 64 bytes from PEB 3:0, read 64 bytes

Well, here I already see that the problem is on driver level because I
cannot read data. Also, if your driver prints an error message in case
of an uncorrectable ECC errors, this could help.

> UBI warning: ubi_eba_init_scan: cannot reserve enough PEBs for bad PEB handling,
>  reserved 17, need 19
> UBI warning: ubi_eba_copy_leb: error -74 while reading data from PEB 3
> UBI error: wear_leveling_worker: error -74 while moving PEB 3 to PEB 2
> UBI warning: ubi_ro_mode: switch to read-only mode
> UBI error: do_work: work failed with error code -74
> UBI error: ubi_thread: ubi_bgt0d: work failed with error code -74
> UBI error: ubi_io_read: error -74 while reading 516096 bytes from PEB 3:8192, re
> ad 516096 bytes
> UBIFS error (pid 1): ubifs_scan: corrupt empty space at LEB 1:8192
> UBIFS error (pid 1): ubifs_scanned_corruption: corrupted data at LEB 1:8192
> UBIFS error (pid 1): ubifs_scan: LEB 1 scanning failed
> UBI error: ubi_io_read: error -74 while reading 516096 bytes from PEB 3:8192, read 516096 bytes
> UBIFS error (pid 1): ubifs_recover_master_node: failed to recover master node

... snip ...

> A better error message would say something like:
> "UBI error: Data page incorrectly programmed to all 0xFFs with non-0xFF ECC."

Probably, but that would happen only if you have debugging checks
enabled, right?

> Another suggestion is rather than creating large files stuffed with 0xFF pads the 
> end of some of the blocks, to have a ubinize option which creates a download header 
> in front of each block with block length and valid data length. Then the 0xFF's 
> wouldn't have to be carried around and the user would be less likely to program 
> 0xFF's by mistake. They would typically only program the useful data that is in 
> the file instead, and since they erased the block to program, the trailing 0xFFs
> would be taken care of automatically. Of course, this would require custom flasher
> changes to accommodate. Thanks.

It is doable, but I can predict then other people will complain why the
hack they cannot use simple nandwrite when flashing UBI images. And for
many people who have HW which has no problems with writing 0xFFs - plane
nandwrite is usable.

But how much 0xFFs are there are? There should not be that many. We pad
special areas like the UBIFS log, the UBI volume table, the UBIFS lprops
area, the UBIFS master area with 0xFF, but that is it. Your _data_,
i.e., the FS contents is not "stuffed with 0xFFs", it is only those
special UBIFS areas.

So, does it really worth doing what you have suggested? Skipping 0xFFed
works just fine. Will the images be really much smaller?

-- 
Best Regards,
Artem Bityutskiy (Артём Битюцкий)




More information about the linux-mtd mailing list