Wear-leveling peculiarities
Johannes Bauer
weolanwaybqm at spornkuller.de
Mon May 18 06:53:51 PDT 2015
Hello list,
I keep track of some devices running an embedded ARM Linux which boots
from NAND flash. On there, ubifs is used. The deployed kernels are:
Linux version 3.0.59 (###@###) (gcc version 4.5.4 20120305 (prerelease)
(GCC) ) #1 Mon Apr 29 16:36:42 CEST 2013
Target is ARMv7 (omap2).
The units have been deployed for three years now. Recently, we've been
seeing units fail more often. This warranted some investigation. I
pulled dd images of the relevant /dev/mtd device (mtd4 in my case) and
wrote a small Python script that evaluated the UBIFS LEB headers, in
particular the erase count. I expected to see a uniform distribution of
erases all around the flash.
But on the contrary, we see the very opposite:
http://imgur.com/a/d5Bhl
Here you see graphs of two units. You can see that the pattern is
identical: Lots of pages which were written seldomly, lots of pages
which were written frequently. Very little inbetween. This is what the
histograms show (erase count on the X axis and their occurences on the Y
axis).
The other graphs are even more disturbing. It shows the physical layout
of the NAND flash. Each pixel corresponds to one LEB. Everything upwards
of 100 erases is red (the scale is linear, shown at the very bottom).
You can see that in some areas, pages are erased very often while in
others they're virtually constant.
This is something I'd expect if the FS would not perform wear-leveling
(files that are written in-place cause page erases at the same locations
over and over). But ubifs should take care of this, shouldn't it? It
might well be that my understanding of ubifs is too limited so I don't
grasp the whole picture. In any case, any advice is greatly appreciated.
Thanks in advance,
Johannes
More information about the linux-mtd
mailing list