mtd/fs/jffs2 erase.c,1.84,1.85 roll back

Tue Sep 20 15:49:19 EDT 2005

Artem B. Bityutskiy wrote:
> Let's analyse the situation.
> 
> JFFS2 issues the erase command, MTD returns OK, but the eraseblock does 
> not contain all 0xFFs. Is this JFFS2's fault? No. this is driver's 
> fault. The driver must not return success in this case. It must return 
> an error.
> 
> And the check in JFFS2 that the eraseblock contains all FFs is *wrong*. 
> That's not JFFS2's business at all. That's MTD's business. MTD 
> guarantees that it either erases the eraseblock or returns error. My 
> oppinion is that this check must go to MTD. Perhaps, you may embrace it 
> by #ifdefs - this is the implementation detal.
> 
> So, JFFS2 does what it is not supposed to do at all. It slows down 
> things by this. I wanted to remove the check, but dwmw2 loudly 
> complained. He wants to be as reliable as it is possible.
It would be nice if the MTD layer could be trusted :)

> Ok, now about the rollback. If we return -EIO, JFFS2 will mark the block 
> as bad. Imagine your driver is broken, and the block is not actually 
> bad. That's just driver. JFFS2 will mark nearly all eraseblocks as bad. 
> What will you do next? In case of NAND, this assumes you're in trouble 
> unless you have a list of real bad blocks.
> 
> So, this is why I rolled back. Probably the right approach will be to 
> add such blocks to the JFFS2 in-RAM bad block list but not mark them as 
> bad physically. But this is a distinct activity. I didn't explore this.
> 
> That "fix" slipped there accidentally. I didn't test it. So I removed 
> it. If you are energetic enough, work this problem out please :-)
> 

So if we return -EIO, then we go to jffs2_erase_failed(). For NAND, if 
the MTD driver reported erase failure (bad_offset != 0xffffffff), a 
retry is made before giving up on the block and marking it bad. Also the 
read operation may have failed, erase may have failed and MTD missed it, 
write of clean marker may have failed. As it is these cases will put the 
block on the bad_list. In the case of NOR it will also be put on the 
bad_list at once.
But how about retrying for all of these cases?

I'm thinking something like this patch...
(It does not make sure that it was a device level failure that occurred 
two times before marking NAND blocks bad physically. So it's flawed.)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: erase_retry.patch
Url: http://lists.infradead.org/pipermail/linux-mtd/attachments/20050920/ed1728fe/attachment.pl