Data integrity check after UBIFORMAT? Bad image sequence number error.

Tue Jun 9 02:05:55 PDT 2015

Am 09.06.2015 um 10:52 schrieb t kevin:
> Hi Richard
> 
> Thanks for the reply. See inline comments below.
> 
> 2015-06-09 16:20 GMT+08:00 Richard Weinberger <richard.weinberger at gmail.com>:
>> On Tue, Jun 9, 2015 at 10:02 AM, t kevin <kevint324 at gmail.com> wrote:
>>> Hi,
>>>
>>> We are using kernel 2.6.36 and mtd-util-1.5.1 on our box.
>>> During system upgrade, very very occasionally ( 1 in 100, maybe? ) I
>>> get this error at ubiattach after ubiformat.
>>>
>>> [ 1632.520000] UBI: attaching mtd8 to ubi0
>>> [ 1632.520000] UBI: physical eraseblock size: 131072 bytes (128 KiB)
>>> [ 1632.530000] UBI: logical eraseblock size: 126976 bytes
>>> [ 1632.530000] UBI: smallest flash I/O unit: 2048
>>> [ 1632.540000] UBI: sub-page size: 512
>>> [ 1632.540000] UBI: VID header offset: 2048 (aligned 2048)
>>> [ 1632.550000] UBI: data offset: 4096
>>> [ 1633.190000] UBI error: process_eb: bad image sequence number
>>> 559476870 in PEB 635, expected 139654706
>>>
>>> I understand ubiformat generate a random sequence number and write the
>>> sequence number to all PEB. So it seems an expected sequence number
>>> somehow is not written into nand flash correctly.
>>
>> Are you sure about that?
>> Can it be that 559476870 is the seq number of the old image and the
>> new one is too small?
>> This is one of the main reasons why we have that number, such that
>> UBI can detect a partial written image.
>>
> I don't really know what "559476870" is. We don't track image sequence
> number : (

Please start tracking them. UBI prints the number while attaching.
If the old number remains after an update, you update was most likely
not complete. And you can start investigate.

>>> So I changed my upgrade sequence like below
>>>
>>> ubiformat ubi.img /dev/mtdx
>>> ubiattach /dev/mtdx
>>>
>>> if [ "$?" != "0" ]
>>>     #do ubiformat again
>>>     ubiformat ubi.img /dev/mtdx
>>
>> You format it while it is attached?
>>
> 
> I'll do re-format only when ubiattach returns fail and then I know
> there is something wrong during ubiformat. So by that time it's not
> attached.

Right you are. :)

>>> My question are,
>>> 1. What could possibly be wrong that caused the ubiformat fail?
>>
>> It can be a faulty MTD driver, a usage error, everything.
>>
>>> 2. Is there a way to verify the data integrity after a UBIFORMAT
>>> process? Something like "mtd verify" function.
>>
>> I fear the answer is "no".
>>
> 
> As I mentioned, the error is very rare, but it did happen multiple
> times. So we are considering data integrity check.

I suspect that sometimes not the whole MTD partition is written.

Just in case, does your MTD driver pass all mtd and UBI tests?

Thanks,
//richard