suspect UBIFS async operations causing issues during reboot
Scott Branden
sbranden at broadcom.com
Thu Nov 6 13:56:53 PST 2014
It looks like the erase happening in the middle of reboot was uncovered
in 2009 and never addressed properly?
https://lkml.org/lkml/2009/6/9/16
https://lkml.org/lkml/2010/2/12/144
Was there a proper resolution to this issue?
On 14-11-05 02:52 PM, Scott Branden wrote:
> On 14-11-05 10:21 AM, Richard Weinberger wrote:
>> Hi!
>>
>> Am 05.11.2014 um 18:56 schrieb Scott Branden:
>>> Hi Richard,
>>>
>>> Thanks for the feedback. Comments inline.
>>>
>>> On 14-11-05 01:22 AM, Richard Weinberger wrote:
>>>> On Wed, Nov 5, 2014 at 9:32 AM, Scott Branden
>>>> <sbranden at broadcom.com> wrote:
>>>>> We are doing reboot testing with UBIFS on the 3.10 kernel with a
>>>>> new chipset
>>>>> we are working on.
>>>>>
>>>>> Over 1000's of reboots we eventually find that the NAND has
>>>>> uncorrectable
>>>>> ECC errors reported on a random page when it is mounted.
>>>>>
>>>>> We have found the problem is that a NAND erase operation is in
>>>>> progress when
>>>>> the reboot occurs. Since the NAND is in the middle of the erase
>>>>> operation
>>>>> the page is mostly FF with some random bits not erased when the reboot
>>>>> occurs.
>>>>>
>>>>> We suspect the problem is the asynchronous nature of the UBIFS
>>>>> operations.
>>>>> Perhaps the small write buffer that can take 3-5 seconds to be
>>>>> written or
>>>>> some other operation occuring in UBI/UBIFS? I don't think the
>>>>> shutdown of
>>>>> the filesystem is dealing with all the threads properly.
>>>>
>>>> And what about powercuts?
>>> powercuts would exhibit the exact same behaviour as we are observing:
>>> the erase is interrupted by loss of power so the NAND block being
>>> erased would be in a partially erased
>>> state. powercuts have little to do with the reboot sequence I am
>>> describing.
>>>
>>>> UBI/UBIFS was designed to survive powercuts.
>>> Yes, this does not cause UBIFS to fail to survive the powercut. It
>>> does cause blocks to not be erased properly.
>>
>> Makes sense.
>>
>>> The block that didn't finish to erase is uncorrectable on next boot-up:
>>>
>>> [ 1.330000] UBI: attaching mtd7 to ubi0
>>> [ 2.000000] iproc_nand 18046000.nand: uncorrectable error at
>>> 0x18700000
>>>
>>> This issue is this blocks shouldn't be corrupted in the first place
>>> if UBI/UBIFS shut downs properly.
>>>
>>>> If your NAND shows strange issues even after a clean reboot
>>>> something nasty is
>>>> going on. Does your driver pass all UBI/MTD test?
>>>>
>>> We are in the process of running the MTD tests. But this appears to
>>> have nothing to do with a buggy driver or not. The NAND driver will
>>> do what it is told to do. If it is told
>>> to erase a block it will erase a block. It can't control if the
>>> system reboots in the middle of this operation?
>>>
>>> This appears to be a UBI/UBIFS issue. UBI/UBIFS operations are still
>>> going on after the filesystem in unmounted. The shutdown process
>>> completes and a reboot happens. My guess is
>>> these operations are due to the asynchronous threads of UBI/UBIFS not
>>> being handled properly during the shutdown process?
>>>
>>> I have found other people have reported unexplained flash corruption.
>>> We back ported this to the 3.10 kernel which solved most of the flash
>>> corruption issues:
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/super.c?id=807612db2f9940b9fa6deaef054eb16d51bd3e00
>>>
>>>
>>> This only remaining flash corruption issue is due to the described
>>> issue of reboot happening in the middle of an erase cycle.
>>
>> You can verify your hypothesis easily. Add a printk() to
>> ubi_detach_mtd_dev(). This function shuts down UBI and also the
>> background thread which does
>> all erase work.
> Hi Richard,
>
> The printk never happens.
>
> I only find ubi_detach_mtd_dev can be called by ubi_exit. But ubi_exit
> is only called if it is a module...
>
> static void __exit ubi_exit(void)
> {
> int i;
>
> for (i = 0; i < UBI_MAX_DEVICES; i++)
> if (ubi_devices[i]) {
> mutex_lock(&ubi_devices_mutex);
> ubi_detach_mtd_dev(ubi_devices[i]->ubi_num, 1);
> mutex_unlock(&ubi_devices_mutex);
> }
> ubi_debugfs_exit();
> kmem_cache_destroy(ubi_wl_entry_slab);
> misc_deregister(&ubi_ctrl_cdev);
> class_remove_file(ubi_class, &ubi_version);
> class_destroy(ubi_class);
> }
> module_exit(ubi_exit);
>
>>
>> Thanks,
>> //richard
>>
>
More information about the linux-mtd
mailing list