mtd: nand: increase ready wait timeout and report timeouts

Linux-MTD Mailing List linux-mtd at lists.infradead.org
Fri Nov 6 10:59:27 PST 2015


Gitweb:     http://git.infradead.org/?p=mtd-2.6.git;a=commit;h=b70af9bef49bd9a5f4e7a2327d9074e29653e665
Commit:     b70af9bef49bd9a5f4e7a2327d9074e29653e665
Parent:     2382960793c2480277ae98a891ea5aa566e06ff1
Author:     Alex Smith <alex.smith at imgtec.com>
AuthorDate: Tue Oct 6 14:52:07 2015 +0100
Committer:  Brian Norris <computersforpeace at gmail.com>
CommitDate: Mon Oct 26 13:02:51 2015 -0700

    mtd: nand: increase ready wait timeout and report timeouts
    
    If nand_wait_ready() times out, this is silently ignored, and its
    caller will then proceed to read from/write to the chip before it is
    ready. This can potentially result in corruption with no indication as
    to why.
    
    While a 20ms timeout seems like it should be plenty enough, certain
    behaviour can cause it to timeout much earlier than expected. The
    situation which prompted this change was that CPU 0, which is
    responsible for updating jiffies, was holding interrupts disabled
    for a fairly long time while writing to the console during a printk,
    causing several jiffies updates to be delayed. If CPU 1 happens to
    enter the timeout loop in nand_wait_ready() just before CPU 0 re-
    enables interrupts and updates jiffies, CPU 1 will immediately time
    out when the delayed jiffies updates are made. The result of this is
    that nand_wait_ready() actually waits less time than the NAND chip
    would normally take to be ready, and then read_page() proceeds to
    read out bad data from the chip.
    
    The situation described above may seem unlikely, but in fact it can be
    reproduced almost every boot on the MIPS Creator Ci20.
    
    Therefore, this patch increases the timeout to 400ms. This should be
    enough to cover cases where jiffies updates get delayed. In nand_wait()
    the timeout was previously chosen based on whether erasing or
    programming. This is changed to be 400ms unconditionally as well to
    avoid similar problems there. nand_wait() is also slightly refactored
    to be consistent with nand_wait{,_status}_ready(). These changes should
    have no effect during normal operation.
    
    Debugging this was made more difficult by the misleading comment above
    nand_wait_ready() stating "The timeout is caught later" - no timeout was
    ever reported, leading me away from the real source of the problem.
    Therefore, a pr_warn() is added when a timeout does occur so that it is
    easier to pinpoint similar problems in future.
    
    Signed-off-by: Alex Smith <alex.smith at imgtec.com>
    Signed-off-by: Harvey Hunt <harvey.hunt at imgtec.com>
    Reviewed-by: Niklas Cassel <niklas.cassel at axis.com>
    Cc: Alex Smith <alex at alex-smith.me.uk>
    Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel at imgtec.com>
    Cc: David Woodhouse <dwmw2 at infradead.org>
    Cc: Niklas Cassel <niklas.cassel at axis.com>
    Signed-off-by: Brian Norris <computersforpeace at gmail.com>
---
 drivers/mtd/nand/nand_base.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
index d87c7d0..cc74142 100644
--- a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
 	}
 }
 
-/* Wait for the ready pin, after a command. The timeout is caught later. */
+/**
+ * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
+ * @mtd: MTD device structure
+ *
+ * Wait for the ready pin after a command, and warn if a timeout occurs.
+ */
 void nand_wait_ready(struct mtd_info *mtd)
 {
 	struct nand_chip *chip = mtd->priv;
-	unsigned long timeo = jiffies + msecs_to_jiffies(20);
+	unsigned long timeo = 400;
 
-	/* 400ms timeout */
 	if (in_interrupt() || oops_in_progress)
-		return panic_nand_wait_ready(mtd, 400);
+		return panic_nand_wait_ready(mtd, timeo);
 
 	led_trigger_event(nand_led_trigger, LED_FULL);
 	/* Wait until command is processed or timeout occurs */
+	timeo = jiffies + msecs_to_jiffies(timeo);
 	do {
 		if (chip->dev_ready(mtd))
-			break;
-		touch_softlockup_watchdog();
+			goto out;
+		cond_resched();
 	} while (time_before(jiffies, timeo));
+
+	pr_warn_ratelimited(
+		"timeout while waiting for chip to become ready\n");
+out:
 	led_trigger_event(nand_led_trigger, LED_OFF);
 }
 EXPORT_SYMBOL_GPL(nand_wait_ready);
@@ -885,15 +894,13 @@ static void panic_nand_wait(struct mtd_info *mtd, struct nand_chip *chip,
  * @mtd: MTD device structure
  * @chip: NAND chip structure
  *
- * Wait for command done. This applies to erase and program only. Erase can
- * take up to 400ms and program up to 20ms according to general NAND and
- * SmartMedia specs.
+ * Wait for command done. This applies to erase and program only.
  */
 static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 {
 
-	int status, state = chip->state;
-	unsigned long timeo = (state == FL_ERASING ? 400 : 20);
+	int status;
+	unsigned long timeo = 400;
 
 	led_trigger_event(nand_led_trigger, LED_FULL);
 
@@ -909,7 +916,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 		panic_nand_wait(mtd, chip, timeo);
 	else {
 		timeo = jiffies + msecs_to_jiffies(timeo);
-		while (time_before(jiffies, timeo)) {
+		do {
 			if (chip->dev_ready) {
 				if (chip->dev_ready(mtd))
 					break;
@@ -918,7 +925,7 @@ static int nand_wait(struct mtd_info *mtd, struct nand_chip *chip)
 					break;
 			}
 			cond_resched();
-		}
+		} while (time_before(jiffies, timeo));
 	}
 	led_trigger_event(nand_led_trigger, LED_OFF);
 



More information about the linux-mtd-cvs mailing list