Performance of wear-levelling in JFFS2.

Thu Nov 17 06:27:20 EST 2005

Hi,

I am wondering about how good wear-levelling really is in JFFS2. I have made 
an experiment which ended with a "MTD do_write_buffer(): software timeout", 
which really looks like flash is taking too long to write data because of it 
beeing near end of life. Only thing is, although the experiment has lasted 
quite long already (in terms of amount of data (re-)written), it doesn't seem 
anyway near as long as expected, when making some "educated guesses" about 
the perfomrance of jffs2. This is the experiment along with the results so 
far:

I have set-up a system as described in the README of the "checkfs" tool that 
is contained in mtd CVS source-code.
The system is a MPC852T based SBC with 32Mbyte of Spansion Mirror-bit flash in 
a single 16-bit-wide chip (S29GL256M11).
Power is yanked by a relais. Checkfs had to be fixed, because it was not 
big-endian compatible (trivial fix), filesize is 4...1024 bytes random, 100 
different files being constantly rewitten (one at a time).

Until now I have the following (hopefully interesting statistics) to share:

Number of reboots so far: 18490
Number of times there was 1 crc error in a file: 66
Number of times there was more than 1 crc error in a file: 0

Total number of times a file was rewritten: 13000000 (13 million).
Size of flash partition: 6 Mbyte, df showed 9% full at the end.

So, what am I thinking, to say the above:
File-data is random data, average file-size is around 500 bytes, add no 
compression (random data doesn't compress), some overhead (headers and stuff) 
and we get maybe some 600 bytes average of new data being written each time. 
Directory i-node also has to be re-written, so let's say for simplicity that 
its also around that amount of data. 
Concluding assumption so far: For every re-write, two chinks of 600 bytes each 
are added to the flash and two equally sized chunks are invalidated.

So, each eraseblock is 64k, that's around 109 such chunks per eraseblock. 
There are around 80 or so eraseblocks that can be shuffled around for 
wear-levelling, so if those are 100% optimally used (neglecting gc overhead) 
we can do 4360 file-rewrites before a single eraseblock is erased for the 
second time (that's 80*109/2)
Ok, so now 13000000/4360 = 2981, is the amount of times a given erase-block 
should have been re-written under this assumtion, and we already have 
worn-out blocks!

The datasheet says 100.000 erase-cycles typical. In practice, it can be less 
of course, but 2981 is rather far less IMHO.

I know my assumptions are pretty simplistic, but can anyone explain how the 
results I am getting are _that_ far off?

Btw: The experiment is continuing, and there are already showing off more such 
time-outs. This is how the part of the logfile looks like:

----------------------------[...] --------------------------
... Creating File:file65. MTD do_write_buffer(): software timeout
Write of 68 bytes at 0x0057603c failed. returned -5, retlen 0
Not marking the space at 0x0057603c as dirty because the flash driver returned 
retlen zero
MTD do_write_buffer(): software timeout
Write of 68 bytes at 0x0057603c failed. returned -5, retlen 0
Not marking the space at 0x0057603c as dirty because the flash driver returned 
retlen zero
Error: Unable to truncate file.: Input/output error
--------------------------[...]-----------------------------

Regards,

-- 
David Jander
Protonic Holland.