[PATCH] [JFFS2] load_xattr_datum need to return a positive number in case of unrecoverable error

Jean-Christophe DUBOIS jcd at tribudubois.net
Thu Apr 26 17:09:52 EDT 2012


On 25/04/2012 15:59, Artem Bityutskiy wrote:
> On Sun, 2012-04-22 at 19:35 +0200, Jean-Christophe DUBOIS wrote:
>>>    Also, please,
>>> tell  whether this fixes a real-life bug or you are fixing a complaint
>>> of a static analysis tools or something like this?
>>>    And please, tell how
>>> you tested it.
> I'd like to get some answers to the above questions.
Well, this patch certainly tries to fix a "real-life" bug.

It mainly intend to re-enable the xattr related self-healing code 
present in JFFS2.

So first, let's try to reproduce this bug in an emulated/controlled 
environment.

# mkdir /tmp/jffs2
# modprobe mtdblock
# modprobe nandsim first_id_byte=0x20 second_id_byte=0x33
# modprobe jffs2
# mount -t jffs2 /dev/mtdblock0 /tmp/jffs2

We create a file and add an attribute
# cp /etc/passwd /tmp/jffs2
# attr -s attr1 -V "TUTU" /tmp/jffs2/passwd

So now we are going to corrupt the attribute:
# umount /tmp/jffs2
# hexedit /dev/mtdblock0

Search for the string TUTU and replace it with tutu and save the all 
thing (Ctrl-X).

Now we remount the jffs2 file:
# mount -t jffs2 /dev/mtdblock0 /tmp/jffs2

And we check "dmesg"
# dmesg
...
[972157.159571] JFFS2 notice: (3038) jffs2_build_xattr_subsystem: 
complete building xattr subsystem, 1 of xdatum (0 unchecked, 0 orphan) 
and 1 of xref (0 dead, 0 orphan) found.

No error found so far. Ok, let's try to check the attribute:
# attr -l /tmp/jffs2/passwd
attr_list: Input/output error
Could not list "(null)" for /tmp/jffs2/passwd
# dmesg
...
[972244.173098] JFFS2 warning: (3068) do_load_xattr_datum: node CRC 
failed (JFFS2_NODETYPE_XREF) at 0xfc8600, read: 0xe46f3e72 calculated: 
0xd2fad3c6

Let's try again ...
# attr -l /tmp/jffs2/passwd
attr_list: Input/output error
Could not list "(null)" for /tmp/jffs2/passwd
#

If you unmount the file system and then remount it, you end up in the 
same situation. Nothing can cure the problem except deleting the file 
(or patching the data back to TUTU in the mtd device).

So, now we modify the jffs2 code to return positive values for 
unrecoverable error and recompile the module (run the usual make ...)

Now let's try the new jffs2 module on the corrupted mtd.
# umount /tmp/jffs2
# rmmod jffs2
# insmod /wherever/fs/jffs2/jffs2.ko
# mount -t jffs2 /dev/mtdblock0 /tmp/jffs2

Let's check for the attribute:
# attr -l /tmp/jffs2/passwd
# dmesg
...
[973190.247650] JFFS2 notice: (10804) jffs2_build_xattr_subsystem: 
complete building xattr subsystem, 1 of xdatum (0 unchecked, 0 orphan) 
and 1 of xref (0 dead, 0 orphan) found.
[973212.584130] JFFS2 warning: (10813) do_load_xattr_datum: node CRC 
failed (JFFS2_NODETYPE_XREF) at 0xfc8600, read: 0xe46f3e72 calculated: 
0xd2fad3c6

No error, this time in user space. The corrupted attribute has been 
automatically discarded from the file system and there are no attributes 
left for this file.

Let's try something else:

# attr -s attr1 -V "TITI" /tmp/jffs2/passwd
# attr -s attr1 -V "TOTO" /tmp/jffs2/passwd
# attr -g attr1 /tmp/jffs2/passwd
Attribute "attr1" had a 4 byte value for /tmp/jffs2/passwd:
TOTO

So this is the last value associated to "attr1". Now we are going to 
corrupt the attribute:
# umount /tmp/jffs2
# hexedit /dev/mtdblock0

Search for the string TOTO and replace it with toto and save the all 
thing (Ctrl-X).

Now we remount the jffs2 file:
# mount -t jffs2 /dev/mtdblock0 /tmp/jffs2

# attr -g attr1 /tmp/jffs2/passwd
Attribute "attr1" had a 4 byte value for /tmp/jffs2/passwd:
TITI

As we can see the attribute value is back to the only left good value. 
The TOTO and TUTU values have been discarded.
If we check dmesg, we find:
# dmesg:
[973559.872703] JFFS2 notice: (10913) jffs2_build_xattr_subsystem: 
complete building xattr subsystem, 3 of xdatum (0 unchecked, 0 orphan) 
and 3 of xref (0 dead, 0 orphan) found.
[973590.824723] JFFS2 warning: (10922) do_load_xattr_datum: node CRC 
failed (JFFS2_NODETYPE_XREF) at 0xfc8600, read: 0xe46f3e72 calculated: 
0xd2fad3c6
[973590.824746] JFFS2 warning: (10922) do_load_xattr_datum: node CRC 
failed (JFFS2_NODETYPE_XREF) at 0xfc8848, read: 0x08bce1ae calculated: 
0x3e290c1a

So the attributes are still allocated and linked to the file but they 
are discarded on first failed access. The value of the only valid 
attribute left is retrieved.

Is this convincing enough?

You might think that the attibute corruption I am showing here is very 
artificial but I can assure you that it does happen in the wild when you 
get a power cut on your equipment while the GC was moving an attribute 
node for example.

Regards

JC
> The reason I am
> asking is that JFFS2 is old and does not see too much maintanance
> anymore, so I feel that it is important important to be careful about
> merging something which does not fix a real life issue.





More information about the linux-mtd mailing list