error in obliterating obsoleted node, possible race?

Mon May 25 11:50:31 EDT 2009

I suspect I found an race in JFFS2 but I cannot convince myself of that. I get lots of
 Write error in obliterating obsoleted node at 0x01ea0000: -30
when I stress the FS with this loop:
  while [ 1 == 1 ] ; do rm -rf a*; cp -ap /opt/appl a1; cp -ap /opt/appl a2;cp -ap /opt/appl a3; done
/opt/appl is a fairly large.

Adding this crude debug code to JFFS2:

diff --git a/fs/jffs2/nodemgmt.c b/fs/jffs2/nodemgmt.c
index 21a0529..7624ff9 100644
--- a/fs/jffs2/nodemgmt.c
+++ b/fs/jffs2/nodemgmt.c
@@ -591,7 +591,9 @@ void jffs2_mark_node_obsolete(struct jffs2_sb_info *c, struct jffs2_raw_node_ref
 		/* We didn't lock the erase_free_sem */
 		return;
 	}
-
+	{
+		unsigned long used_size = jeb->used_size;
+		unsigned long unchecked_size = jeb->unchecked_size;
 	if (jeb == c->nextblock) {
 		D2(printk(KERN_DEBUG "Not moving nextblock 0x%08x to dirty/erase_pending list\n", jeb->offset));
 	} else if (!jeb->used_size && !jeb->unchecked_size) {
@@ -674,9 +676,24 @@ void jffs2_mark_node_obsolete(struct jffs2_sb_info *c, struct jffs2_raw_node_ref
 	n.nodetype = cpu_to_je16(je16_to_cpu(n.nodetype) & ~JFFS2_NODE_ACCURATE);
 	ret = jffs2_flash_write(c, ref_offset(ref), sizeof(n), &retlen, (char *)&n);
 	if (ret) {
+		struct jffs2_unknown_node n2;
+
 		printk(KERN_WARNING "Write error in obliterating obsoleted node at 0x%08x: %d\n", ref_offset(ref), ret);
+		printk(KERN_WARNING "Used/Unchecked for ref 0x%08x: %lu:%lu\n", ref_offset(ref),
+		       used_size, unchecked_size);
+		ret = jffs2_flash_read(c, ref_offset(ref), sizeof(n), &retlen, (char *)&n2);
+		if (ret)
+			printk(KERN_WARNING "Read confirm error node at 0x%08x: %d\n", ref_offset(ref), ret);
+		else {
+			ret = memcmp(&n, &n2, sizeof(n));
+			if (ret)
+				printk(KERN_WARNING "DIFF:Read confirm node at 0x%08x: %d\n", ref_offset(ref), ret);
+			else
+				printk(KERN_WARNING "SAME:Read confirm node at 0x%08x: %d\n", ref_offset(ref), ret);
+		}
 		goto out_erase_sem;
 	}
+	}
 	if (retlen != sizeof(n)) {
 		printk(KERN_WARNING "Short write in obliterating obsoleted node at 0x%08x: %zd\n", ref_offset(ref), retlen);
 		goto out_erase_sem;

I see lots of:
flash: buffer locked error (status 0xd2)
Write error in obliterating obsoleted node at 0x01ea0000: -30
Used/Unchecked for ref 0x01ea0000: 0:0
DIFF:Read confirm node at 0x01ea0000: -32
flash: buffer locked error (status 0xd2)
Write error in obliterating obsoleted node at 0x06520000: -30
Used/Unchecked for ref 0x06520000: 0:0
DIFF:Read confirm node at 0x06520000: -32
flash: buffer locked error (status 0xd2)
Write error in obliterating obsoleted node at 0x02280000: -30
Used/Unchecked for ref 0x02280000: 0:0
DIFF:Read confirm node at 0x02280000: -32
flash: buffer locked error (status 0xd2)
Write error in obliterating obsoleted node at 0x06e60000: -30
Used/Unchecked for ref 0x06e60000: 0:0
DIFF:Read confirm node at 0x06e60000: -32

Notice that Used/Unchecked is always 0 so
the block ends up in the erase_pending_list or c->erasable_list before
one marks it obsolete in flash.
Is this allowed? The erase_free_sem is held so I guess it is allowed, but
I can't shake the feeling that one might end up writing to a block that is
already erasing.

I know that the chip status 0xd2 means that the block is locked, but I am
sure it isn't (unless JFFS2 managed to do that for me)

Note I got 4 consecutive chips in the FS.

   Jocke