JFFS2 & NAND failure

Wed Feb 2 11:21:43 EST 2005

Estelle HAMMACHE wrote:
> 
> Hi everyone,
> 
> it seems there is a problem with jffs2_wbuf_recover and
> the wbuf_sem...
> 
> jffs2_flash_writev
> ** down_write(&c->wbuf_sem);  !!!
> ** __jffs2_flush_wbuf
> **** jffs2_wbuf_recover
> ******  jffs2_block_refile
> ********  nextblock = NULL;
> ******  jffs2_reserve_space_gc
> ********  jffs2_do_reserve_space
> **********  jffs2_erase_pending_blocks
> ************  jffs2_mark_erased_block
> **************  jffs2_flash_read
> ****************  down_read(&c->wbuf_sem); !!!
> 

After some thinking I wrote a smallish patch to correct 
this part of the problem (edited below to show the full
function). If there are no objections I will commit it
this week-end. The wbuf semaphore is locked only after
the first checks, I don't believe this can cause trouble
because the wbuf is not freed once it is allocated
and the later checks are enough to prevent copying
a wrong wbuf contents.

The other case I mentionned (no erasing block so
jffs2_do_reserve_space tries to flush the wbuf)
seems impossible - we would not have allowed writing
in this case.

bye
Estelle

 int jffs2_flash_read(struct jffs2_sb_info *c, loff_t ofs, size_t len, size_t *retlen, u_char *buf)
 {
 	loff_t	orbf = 0, owbf = 0, lwbf = 0;
 	int	ret;

 	/* Read flash */
 	if (!jffs2_can_mark_obsolete(c)) {
-		down_read(&c->wbuf_sem);

 		if (jffs2_cleanmarker_oob(c))
 			ret = c->mtd->read_ecc(c->mtd, ofs, len, retlen, buf, NULL, c->oobinfo);
 		else
 			ret = c->mtd->read(c->mtd, ofs, len, retlen, buf);

 		if ( (ret == -EBADMSG) && (*retlen == len) ) {
 			printk(KERN_WARNING "mtd->read(0x%zx bytes from 0x%llx) returned ECC error\n",
 			       len, ofs);
 			/* 
 			 * We have the raw data without ECC correction in the buffer, maybe 
 			 * we are lucky and all data or parts are correct. We check the node.
 			 * If data are corrupted node check will sort it out.
 			 * We keep this block, it will fail on write or erase and the we
 			 * mark it bad. Or should we do that now? But we should give him a chance.
 			 * Maybe we had a system crash or power loss before the ecc write or  
 			 * a erase was completed.
 			 * So we return success. :)
 			 */
 		 	ret = 0;
 		 }	
 	} else
 		return c->mtd->read(c->mtd, ofs, len, retlen, buf);

 	/* if no writebuffer available or write buffer empty, return */
 	if (!c->wbuf_pagesize || !c->wbuf_len)
-		goto exit;
+		return ret;

 	/* if we read in a different block, return */
 	if ( (ofs & ~(c->sector_size-1)) != (c->wbuf_ofs & ~(c->sector_size-1)) ) 
-		goto exit;
+		return ret;
+
+	/* Lock only if we have reason to believe wbuf contains relevant data,
+	   so that checking an erased block during wbuf recovery space allocation
+	   does not deadlock. */
+	down_read(&c->wbuf_sem);

 	if (ofs >= c->wbuf_ofs) {
 		owbf = (ofs - c->wbuf_ofs);	/* offset in write buffer */
 		if (owbf > c->wbuf_len)		/* is read beyond write buffer ? */
 			goto exit;
 		lwbf = c->wbuf_len - owbf;	/* number of bytes to copy */
 		if (lwbf > len)	
 			lwbf = len;
 	} else {	
 		orbf = (c->wbuf_ofs - ofs);	/* offset in read buffer */
 		if (orbf > len)			/* is write beyond write buffer ? */
 			goto exit;
 		lwbf = len - orbf; 		/* number of bytes to copy */
 		if (lwbf > c->wbuf_len)	
 			lwbf = c->wbuf_len;
 	}	
 	if (lwbf > 0)
 		memcpy(buf+orbf,c->wbuf+owbf,lwbf);

 exit:
 	up_read(&c->wbuf_sem);
 	return ret;
 }