[RFC/PATCH][NOR] Fix cfi_cmdset_0001.c FL_SYNCING race

Tue Apr 29 05:37:42 EDT 2008

Hello,

we've met another CFI driver issue with multipartitional devices.

One can easily reproduce the bug by
1. Running intensive writes on one jffs2 volume and
2. Simultaneously performing mount/unmount cycle with another jffs2 volume at the same chip.

With almost 100% probability we hit the deadlock or/and the following errors:

---
Newly-erased block contained word 0x20031985 at offset 0x006c0000
flash1: buffer write error (status 0x1d0)
jffs2_flush_wbuf(): Write failed with -22
Write of 3681 bytes at 0x006c0400 failed. returned -22, retlen 0
Not marking the space at 0x006c0400 as dirty because the flash driver returned retlen zero
Error writing new dnode: -22
Error garbage collecting node at 00254700!
No space for garbage collection. Aborting GC thread
---

We found this issue was related to FL_SYNCING state race with other operations.
The patch below tries to remove these races.

Any comments on this patch are welcome.

---
Signed-off-by: Alexander Belyakov <abelyako at googlemail.com>


diff -uNrp a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c

--- a/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-24 02:34:28.000000000 +0400
+++ b/drivers/mtd/chips/cfi_cmdset_0001.c	2008-04-28 16:41:03.000000000 +0400
@@ -702,6 +702,10 @@ static int chip_ready (struct map_info *
 	struct cfi_pri_intelext *cfip = cfi->cmdset_priv;
 	unsigned long timeo = jiffies + HZ;
 
+	/* Prevent setting state FL_SYNCING for chip in suspended state. */
+	if (FL_SYNCING == mode && FL_READY != chip->oldstate)
+		goto sleep;
+
 	switch (chip->state) {
 
 	case FL_STATUS:
@@ -807,8 +811,9 @@ static int get_chip(struct map_info *map
 	DECLARE_WAITQUEUE(wait, current);
 
  retry:
-	if (chip->priv && (mode == FL_WRITING || mode == FL_ERASING
-			   || mode == FL_OTP_WRITE || mode == FL_SHUTDOWN)) {
+	if (chip->priv &&
+	    (mode == FL_WRITING || mode == FL_ERASING || mode == FL_OTP_WRITE 
+	    || mode == FL_SHUTDOWN ) && chip->state != FL_SYNCING) {
 		/*
 		 * OK. We have possibility for contention on the write/erase
 		 * operations which are global to the real chip and not per
@@ -858,6 +863,14 @@ static int get_chip(struct map_info *map
 				return ret;
 			}
 			spin_lock(&shared->lock);
+			
+			/* We should not own chip if it is already in FL_SYNCING state.
+			 * Put contender and retry. */
+			if (chip->state == FL_SYNCING) {
+				put_chip(map, contender, contender->start);
+				spin_unlock(contender->mutex);
+				goto retry;
+			}
 			spin_unlock(contender->mutex);
 		}