OOB Test fails

Danesh Daroui Danesh.Daroui at ascom.com
Thu Oct 27 03:51:14 PDT 2016


Hi Boris,

Thanks for your help. We would really like to upgrade the Kernel and that is a wise approach of course, but we would like to be sure that this is due to the outdated Kernel or whether this is a hardware problem since Kernel upgrading is a time consuming and cumbersome task, but definitely necessary as you mentioned. Right now I am trying to run UBIFS tests which are included in "mtd_utils". I hope these tests will give me some hints if there is any problem is UBI/UBIFS layers. I had written my own stress test before which would test the memory on POSIX level (same as UBI/UBIFS layers more or less), and I experienced some crashes but could not identify what is the reason. For instance I could not find out if the crash happens due to a bug in driver or file system, etc. 

The flash memory we are using is a Micron NAND 1GiB 3,3V 8-bit and the driver delivered with Kernel 3.6.39. Have you heard about similar problem before? Or do you want me to give you more info about the hardware and the system we have under test?

Thanks again for your help,

Danesh Daroui


-----Original Message-----
From: Boris Brezillon [mailto:boris.brezillon at free-electrons.com] 
Sent: den 27 oktober 2016 09:38
To: Danesh Daroui <Danesh.Daroui at ascom.com>
Cc: Steve deRosier <derosier at gmail.com>; linux-mtd at lists.infradead.org
Subject: Re: OOB Test fails

Hi Danesh,

On Wed, 26 Oct 2016 16:28:43 +0000
Danesh Daroui <Danesh.Daroui at ascom.com> wrote:

> Hi Steve,
> 
> Thank you for your prompt answer. When I run OOB test (mtd_oobtest), for instance, one of devices always return verification failed error on a certain address. This is all we know and all the test reports. We use a quite old kernel i.e. 2.6.39 and this is one of the things that we suspect as a source of the problem that the kernel is outdated. Also, we consider the hardware failure since on some devices no error is shown on OOB test while on others more errors are shown and the address is changed randomly sometimes.

Yes, please, try with a newer kernel: I won't help debugging such an old thing.

> 
> Our main problem is that sometimes UBIFS forces the device into read-only mode due to "bad CRC" error at startup when the device is booted. I am now running tests which are in "mtd_utils" for testing file system. I have started running two tests which are "simple/test_1" and "simple/test_2" which simply write until the drive is full and the read the data back and verify the correctness. During the test, I see lots of:
> 
> UBI: scrubbed PEB 585 (LEB 3:770), data moved to PEB 1772
> UBI: scrubbed PEB 1045 (LEB 3:1261), data moved to PEB 828
> UBI: scrubbed PEB 1493 (LEB 3:664), data moved to PEB 814
> UBI: scrubbed PEB 751 (LEB 3:1260), data moved to PEB 1772
> 
> In my mind, this is related to problematic hardware that the data is corrupted on many cells that UBIFS tries to move the data when a corruption is detected. My question is, whether this guess can be valid or this is mostly due to old kernel that we are using and upgrading to a new kernel would most likely solve the problems?

Well, I can't tell. It can be caused by a buggy NAND controller driver, a bug in the UBI layer or maybe your NAND is simply worn.

Try with a newer kernel, and let's see what the MTD tests and MTD utils tests say.

BTW, which NAND and NAND controller are your testing on?

Regards,

Boris



More information about the linux-mtd mailing list