suspect UBIFS async operations causing issues during reboot

Mon Nov 10 01:08:39 PST 2014

Am 10.11.2014 um 09:44 schrieb Ricard Wanderlof:
> 
> On Sun, 9 Nov 2014, Richard Weinberger wrote:
> 
>> Am 07.11.2014 um 18:31 schrieb Scott Branden:
>>> On 14-11-07 12:45 AM, Richard Weinberger wrote:
>>>> Am 06.11.2014 um 22:56 schrieb Scott Branden:
>>>>> It looks like the erase happening in the middle of reboot was uncovered in 2009 and never addressed properly?
>>>>>
>>>>> https://lkml.org/lkml/2009/6/9/16
>>>>> https://lkml.org/lkml/2010/2/12/144
>>>>>
>>>>> Was there a proper resolution to this issue?
>>>>
>>>> Did you read the threads you've posted?
>>>>
>>>> There two answers:
>>>> https://lkml.org/lkml/2010/2/12/143
>>> Yes, there is no hardware solution to a reset happening in the middle of an erase operation to NAND.
>>
>> Well, I agree with David that anything we do in software will only hide the real problem
>> or trim down the window.
> 
> There's something I don't understand here. It could be (and probably will 
> prove to be) my lack of knowledge on the detailed workings of UBI.
> 
> Back in jffs2 days, erased blocks were so indicated by writing a 
> 'cleanmarker' pattern to the OOB area. Thus, when scanning the flash, if a 
> block was encountered which appeared erased but lacked the cleanmarker, it 
> was re-erased just in case the previous erase was interrupted and 
> therefore did not leave the bits in a properly erased state.
> 
> With ubifs, cleanmarkers are not used (partly because MLC flashes wouldn't 
> support two writes to the OOB area: one for the cleanmarker and one for 
> the ECC), but there _is_ a header at the start of each PEB. Thus the same 
> situation really holds, if a (seemingly) erased PEB is encountered with no 
> EC header, it could be considered the leftover of an unfinished erase 
> operation. I don't know for a fact if (or how) UBI does this though.
> 
> Of course, and interrupted erase operation could leave a block in a 
> seemingly un-erased state, i.e. the data appears intact (but may not be). 
> But in that case the block would already be superseded by another block 
> (i.e. any potential data would have already been copied to another block 
> with the header infoinvalidating the old one). So in this case the block 
> would go on an erase list at some point because it is no longer valid.
> 
> Since interrupted erase seems to be of so much a concern I've obviously 
> missed something above. But I can't figure out what.
> 
> The only thing that seems relevant among the links above is
> 
> https://lkml.org/lkml/2010/2/12/144
> 
> which indicates that half-erased blocks might cause problems with certain 
> boot loaders, but again, that's a problem with the bootloader, not UBI.

Correct. UBI can deal with that, if some component in your "NAND-Chain" does not, it
needs fixing.
Changing UBI/MTD in a way to hide such issues in not a good solution IMHO.
In the old thread the idea was rejected by both the UBI and the MTD maintainer.

Thanks,
//richard