spi-nor: excessive locking in spi_nor_erase()

Joakim Tjernlund Joakim.Tjernlund at infinera.com
Tue Apr 16 11:03:39 PDT 2024


On Tue, 2024-04-16 at 17:44 +0200, Pratyush Yadav wrote:
> Hi,

Hi Pratyush, thanks for replying. Se below

> 
> On Mon, Apr 15 2024, Joakim Tjernlund wrote:
> 
> > static int spi_nor_erase(struct mtd_info *mtd, struct erase_info *instr)
> > {
> > ...
> > Lock here
> > 	ret = spi_nor_lock_and_prep(nor);
> > 	if (ret)
> > 		return ret;
> > ....
> > Then we have:
> > 	} else if (spi_nor_has_uniform_erase(nor)) {
> > 		while (len) {
> > 			ret = spi_nor_write_enable(nor);
> > 			if (ret)
> > 				goto erase_err;
> > 
> > 			ret = spi_nor_erase_sector(nor, addr);
> > 			if (ret)
> > 				goto erase_err;
> > 
> > 			ret = spi_nor_wait_till_ready(nor);
> > 			if (ret)
> > 				goto erase_err;
> > 
> > 			addr += mtd->erasesize;
> > 			len -= mtd->erasesize;
> > 		}
> > 
> > 	/* erase multiple sectors */
> > 	} else {
> > 		ret = spi_nor_erase_multi_sectors(nor, addr, len);
> > 		if (ret)
> > 			goto erase_err;
> > 	}
> > ....
> > erase_err:
> > unlock here
> > 	spi_nor_unlock_and_unprep(nor);
> > 
> > 	return ret;
> > }
> 
> Firstly, seems like you are looking at an older version of the driver.
> spi_nor_erase() looks a bit different now, though works pretty much the
> same in practice. See [0].

Ah, yes. I am on 5.15.x, not ready to move forward on this product yet.

> 
> > 
> > So erase locks the flash for the whole erase op which can include many sectors and I don't see
> > any Erase Suspend handling in spi-nor like the cfi_cmdset_0001/cfi_cmdset_0002 has.
> > 
> > Locking can be a challenge but this seems over the top, anyone working/looking into improving this?
> 
> I think you should lead with what the problem you see with this and what
> you think can be improved. Why do you think the locking is "over the
> top"? What should be improved?

Sure, we have been using NOR flash as RFS(JFFS2) for many years now and it
is important that RFS is responsive at all times. Up til recently we used parallel NOR
using cfi_cmdset_0001/cfi_cmdset_0002 which has worked great.

QSPI is different beast and its driver not mature for RFS use. Consider when our
sw update starts to erase Image B partition, then the flash becomes completely locked for other
processes to use, one cannot even connect with ssh while sw update is ongoing or do an ls.

ATM I am testing this(which makes the system much more responsive:
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index bcaae9c71d90..62fe9cb97139 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1635,6 +1635,13 @@ static int spi_nor_erase_multi_sectors(struct spi_nor *nor, u64 addr, u32 len)
                        dev_vdbg(nor->dev, "erase_cmd->size = 0x%08x, erase_cmd->opcode = 0x%02x, erase_cmd->count = %u\n",
                                 cmd->size, cmd->opcode, cmd->count);
 
+                       spi_nor_unlock_and_unprep(nor);
+                       //cond_resched();
+                       schedule();
+                       ret = spi_nor_lock_and_prep(nor);
+                       if (ret)
+                               goto destroy_erase_cmd_list;
+
                        ret = spi_nor_write_enable(nor);
                        if (ret)
                                goto destroy_erase_cmd_list;
@@ -1721,6 +1728,13 @@ static int spi_nor_erase(struct mtd_info *mtd, struct erase_info *instr)
        /* "sector"-at-a-time erase */
        } else if (spi_nor_has_uniform_erase(nor)) {
                while (len) {
+                       spi_nor_unlock_and_unprep(nor);
+                       //cond_resched();
+                       schedule();
+                       ret = spi_nor_lock_and_prep(nor);
+                       if (ret)
+                               return ret;
+
                        ret = spi_nor_write_enable(nor);
                        if (ret)
                                goto erase_err;
@@ -1987,6 +2001,9 @@ static int spi_nor_write(struct mtd_info *mtd, loff_t to, size_t len,
                ssize_t written;
                loff_t addr = to + i;
 
+               spi_nor_unlock_and_unprep(nor);
+               //cond_resched();
+               schedule();
                /*
                 * If page_size is a power of two, the offset can be quickly
                 * calculated with an AND operation. On the other cases we
@@ -2005,6 +2022,10 @@ static int spi_nor_write(struct mtd_info *mtd, loff_t to, size_t len,
 
                addr = spi_nor_convert_addr(nor, addr);
 
+               ret = spi_nor_lock_and_prep(nor);
+               if (ret)
+                       return ret;
+
                ret = spi_nor_write_enable(nor);
                if (ret)
                        goto write_err;

This is s good first step but not enough, one need to support Erase Suspend(which the vast majority of flash support)
to get full responsiveness(see cfi_cmdset_0001/cfi_cmdset_0002)

> 
> Most flashes can't do any operations (except poll status register) when

Not quite, see above comment on Erase Suspend

> a program or erase operation is in progress. So the locking here makes
> sense to me. We do not want to let other tasks attempt any other
> operations until the erase is done. The only exception is some
> multi-bank flashes that can do erase on one bank and reads on other
> banks. Miquel's series [1] takes care of those (though I do not see any
> flashes using that feature yet).
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/mtd/spi-nor/core.c#n1789
> [1] https://lore.kernel.org/linux-mtd/20230328154105.448540-1-miquel.raynal@bootlin.com/
> 



More information about the linux-mtd mailing list