spi-nor: excessive locking in spi_nor_erase()

Joakim Tjernlund Joakim.Tjernlund at infinera.com
Wed Apr 17 05:54:36 PDT 2024


On Wed, 2024-04-17 at 14:42 +0200, Pratyush Yadav wrote:
> On Tue, Apr 16 2024, Joakim Tjernlund wrote:
> 
> > On Tue, 2024-04-16 at 17:44 +0200, Pratyush Yadav wrote:
> > > Hi,
> > 
> > Hi Pratyush, thanks for replying. Se below
> > 
> > > 
> [...]
> > > > 
> > > > So erase locks the flash for the whole erase op which can include many sectors and I don't see
> > > > any Erase Suspend handling in spi-nor like the cfi_cmdset_0001/cfi_cmdset_0002 has.
> > > > 
> > > > Locking can be a challenge but this seems over the top, anyone working/looking into improving this?
> > > 
> > > I think you should lead with what the problem you see with this and what
> > > you think can be improved. Why do you think the locking is "over the
> > > top"? What should be improved?
> > 
> > Sure, we have been using NOR flash as RFS(JFFS2) for many years now and it
> > is important that RFS is responsive at all times. Up til recently we used parallel NOR
> > using cfi_cmdset_0001/cfi_cmdset_0002 which has worked great.
> > 
> > QSPI is different beast and its driver not mature for RFS use. Consider when our
> > sw update starts to erase Image B partition, then the flash becomes completely locked for other
> > processes to use, one cannot even connect with ssh while sw update is ongoing or do an ls.
> 
> I see. I assume the rootfs is on the NOR flash. While a big erase

Yes, same flash chip.

> (spanning multiple sectors) is running, other tasks do still get to run
> (since spi_nor_wait_till_ready() calls cond_resched()) but any task that
> tries to access the rootfs would have to freeze since the lock is held.
> 
> Dropping and re-acquiring the lock after erasing each sector or
> programming each page would let readers make progress in between. This
> shouldn't be too difficult to implement I reckon. You already mostly do
> it in your patch below.

Yes, that is simple.

> 
> Doing erase or program suspend would only make sense for flashes with
> larger sectors since even one sector erase would take a long time. For
> those, I suppose we could give reads priority over program or erase
> operations. When a read comes in, it suspends the erase, does the read,
> and then resumes it. This would be a little bit more complex to
> implement.

Right, a bit harder but I think it is needed like cfi_cmdset_0001 do: suspend erase to do read/write
Don't think cfi_cmdset suspends writes, that seems overkill.
> 
> I would suggest you try the former first and see if it already gives you
> the results you need. From your patch below, it seems it does. So
> perhaps just cleaning it up and turning it onto a proper patch would do
> the job for you.

Tests with my patch shows vastly improved responsivity but there is still some
way to go. Consider that an erase takes some 0.20-0.25 secs for us in best case in
which apps are blocked. We do get some complaints from the app taking too long to complete
some tasks during erase. 

> 
> > 
> > ATM I am testing this(which makes the system much more responsive:
> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> > index bcaae9c71d90..62fe9cb97139 100644
> > --- a/drivers/mtd/spi-nor/core.c
> > +++ b/drivers/mtd/spi-nor/core.c
> > @@ -1635,6 +1635,13 @@ static int spi_nor_erase_multi_sectors(struct spi_nor *nor, u64 addr, u32 len)
> >                         dev_vdbg(nor->dev, "erase_cmd->size = 0x%08x, erase_cmd->opcode = 0x%02x, erase_cmd->count = %u\n",
> >                                  cmd->size, cmd->opcode, cmd->count);
> >  
> > +                       spi_nor_unlock_and_unprep(nor);
> > +                       //cond_resched();
> > +                       schedule();
> > +                       ret = spi_nor_lock_and_prep(nor);
> > +                       if (ret)
> > +                               goto destroy_erase_cmd_list;
> > +
> >                         ret = spi_nor_write_enable(nor);
> >                         if (ret)
> >                                 goto destroy_erase_cmd_list;
> > @@ -1721,6 +1728,13 @@ static int spi_nor_erase(struct mtd_info *mtd, struct erase_info *instr)
> >         /* "sector"-at-a-time erase */
> >         } else if (spi_nor_has_uniform_erase(nor)) {
> >                 while (len) {
> > +                       spi_nor_unlock_and_unprep(nor);
> > +                       //cond_resched();
> > +                       schedule();
> > +                       ret = spi_nor_lock_and_prep(nor);
> > +                       if (ret)
> > +                               return ret;
> > +
> >                         ret = spi_nor_write_enable(nor);
> >                         if (ret)
> >                                 goto erase_err;
> > @@ -1987,6 +2001,9 @@ static int spi_nor_write(struct mtd_info *mtd, loff_t to, size_t len,
> >                 ssize_t written;
> >                 loff_t addr = to + i;
> >  
> > +               spi_nor_unlock_and_unprep(nor);
> > +               //cond_resched();
> > +               schedule();
> >                 /*
> >                  * If page_size is a power of two, the offset can be quickly
> >                  * calculated with an AND operation. On the other cases we
> > @@ -2005,6 +2022,10 @@ static int spi_nor_write(struct mtd_info *mtd, loff_t to, size_t len,
> >  
> >                 addr = spi_nor_convert_addr(nor, addr);
> >  
> > +               ret = spi_nor_lock_and_prep(nor);
> > +               if (ret)
> > +                       return ret;
> > +
> >                 ret = spi_nor_write_enable(nor);
> >                 if (ret)
> >                         goto write_err;
> > 
> > This is s good first step but not enough, one need to support Erase Suspend(which the vast majority of flash support)
> > to get full responsiveness(see cfi_cmdset_0001/cfi_cmdset_0002)
> > 
> > > 
> > > Most flashes can't do any operations (except poll status register) when
> > 
> > Not quite, see above comment on Erase Suspend
> 
> Yes, I meant in the absence of a suspend operation, which SPI NOR does
> not currently do.
> 
> > 
> > > a program or erase operation is in progress. So the locking here makes
> > > sense to me. We do not want to let other tasks attempt any other
> > > operations until the erase is done. The only exception is some
> > > multi-bank flashes that can do erase on one bank and reads on other
> > > banks. Miquel's series [1] takes care of those (though I do not see any
> > > flashes using that feature yet).
> > > 
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/mtd/spi-nor/core.c#n1789
> > > [1] https://lore.kernel.org/linux-mtd/20230328154105.448540-1-miquel.raynal@bootlin.com/
> > > 
> > 
> 



More information about the linux-mtd mailing list