Wear-leveling peculiarities

Johannes Bauer weolanwaybqm at spornkuller.de
Mon May 18 06:53:51 PDT 2015


Hello list,

I keep track of some devices running an embedded ARM Linux which boots 
from NAND flash. On there, ubifs is used. The deployed kernels are:

Linux version 3.0.59 (###@###) (gcc version 4.5.4 20120305 (prerelease) 
(GCC) ) #1 Mon Apr 29 16:36:42 CEST 2013

Target is ARMv7 (omap2).

The units have been deployed for three years now. Recently, we've been 
seeing units fail more often. This warranted some investigation. I 
pulled dd images of the relevant /dev/mtd device (mtd4 in my case) and 
wrote a small Python script that evaluated the UBIFS LEB headers, in 
particular the erase count. I expected to see a uniform distribution of 
erases all around the flash.

But on the contrary, we see the very opposite:

http://imgur.com/a/d5Bhl

Here you see graphs of two units. You can see that the pattern is 
identical: Lots of pages which were written seldomly, lots of pages 
which were written frequently. Very little inbetween. This is what the 
histograms show (erase count on the X axis and their occurences on the Y 
axis).

The other graphs are even more disturbing. It shows the physical layout 
of the NAND flash. Each pixel corresponds to one LEB. Everything upwards 
of 100 erases is red (the scale is linear, shown at the very bottom). 
You can see that in some areas, pages are erased very often while in 
others they're virtually constant.

This is something I'd expect if the FS would not perform wear-leveling 
(files that are written in-place cause page erases at the same locations 
over and over). But ubifs should take care of this, shouldn't it? It 
might well be that my understanding of ubifs is too limited so I don't 
grasp the whole picture. In any case, any advice is greatly appreciated.

Thanks in advance,
Johannes





More information about the linux-mtd mailing list