[PATCH] mtd: cfi_cmdset_0002: Change erase functions to retry for error

Wed May 9 03:20:27 PDT 2018

Hi Jocke-san,

  Thanks for the mail.
  I have just understood about the Erase suspend issue is not related to the change.
  Also it is not able to be resolved by the chagne.

Hi Boris-san,

  So the below commit message is not correct so I will update it to remove to describe the latter part about the Erase suspend issue.

> > > Also the change is possible to resolve other erase error.
> > > For example the Erase suspend issue was caused on Cypress AMD NOR flash.
> > >   S29GL01GS / S29GL512S / S29GL256S / S29GL128S 

Regards,
Ikegami

> -----Original Message-----
> From: Joakim Tjernlund [mailto:Joakim.Tjernlund at infinera.com]
> Sent: Wednesday, May 09, 2018 5:19 PM
> To: boris.brezillon at bootlin.com; IKEGAMI Tokunori
> Cc: linux-mtd at lists.infradead.org; PACKHAM Chris;
> cyrille.pitchen at wedev4u.fr; dwmw2 at infradead.org;
> computersforpeace at gmail.com; boris.brezillon at free-electrons.com;
> marek.vasut at gmail.com; richard at nod.at
> Subject: Re: [PATCH] mtd: cfi_cmdset_0002: Change erase functions to retry
> for error
> 
> On Tue, 2018-05-08 at 19:11 +0200, Boris Brezillon wrote:
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you recognize the sender and know
> the content is safe.
> >
> >
> > +Joakim
> >
> > On Tue, 8 May 2018 16:52:55 +0000
> > IKEGAMI Tokunori <ikegami at allied-telesis.co.jp> wrote:
> >
> > > From: Tokunori Ikegami <ikegami at allied-telesis.co.jp>
> > >
> > > For the word write functions it is retried for error.
> > > But it is not implemented to retry for the erase functions.
> > > To make sure for the erase functions change to retry as same.
> > >
> > > This is needed to prevent the flash erase error caused only once.
> > > It was caused by the error case of chip_good() in the
> do_erase_oneblock().
> > > Also it was confirmed on the MACRONIX flash device MX29GL512FHT2I-11G.
> > > But the error issue behavior is not able to reproduce at this moment.
> >
> > Hm, that's a problem. How can you be sure the retry approach fixes the
> > issue if you can't reproduce the it?
> >
> > > The flash controller is parallel Flash interface integrated on BCM53003.
> > >
> > > Also the change is possible to resolve other erase error.
> > > For example the Erase suspend issue was caused on Cypress AMD NOR flash.
> > >   S29GL01GS / S29GL512S / S29GL256S / S29GL128S
> >
> > Is the Erase suspend issue related to this problem?
> >
> 
> My Erase suspend issue is not related to just retrying the erase.
> We have seen several times that too many Erase suspend for one block will
> cause 5.5.2.6 DQ5:
>  Exceeded Timing Limits
> This condition is not checked for, nor handled in any way.
> Once this condition has occurred, a simple retry wont fix it. One must
> first issue a flash reset:
>   map_write( map, CMD(0xF0), chip->in_progress_block_addr); /* Reset */
> which will clear this error condition and return the flash to read array
> more.
> Now you can restart the erase again and it will probably finish the erase
> this time(at least in our testing).
> 
> Testing for DQ5: Exceeded Timing Limits is not straight forward in current
> 0002 driver
> since it requires more than just testing for toggling bits. I have not
> made any progress on this due to other more pressing work.
> 
> Note: In our testing we needed over 300000 suspend/resume before hitting
> the DQ5
> limit.
> Here is my test hack so far:
> diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c
> b/drivers/mtd/chips/cfi_cmdset_0002.c
> index 23a893ab4264..38b52e2c6fb6 100644
> --- a/drivers/mtd/chips/cfi_cmdset_0002.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0002.c
> @@ -829,6 +829,7 @@ static int get_chip(struct map_info *map, struct flchip
> *chip, unsigned long adr
>                 chip->oldstate = FL_ERASING;
>                 chip->state = FL_ERASE_SUSPENDING;
>                 chip->erase_suspended = 1;
> +               chip->in_progress_suspends++;
>                 for (;;) {
>                         if (chip_ready(map, adr))
>                                 break;
> @@ -839,8 +840,33 @@ static int get_chip(struct map_info *map, struct flchip
> *chip, unsigned long adr
>                                  * there was an error (so leave the erase
>                                  * routine to recover from it) or we trying
> to
>                                  * use the erase-in-progress sector. */
> +                               map_word status = map_read(map, adr);
> +
> +                               map_write( map, CMD(0xF0),
> chip->in_progress_block_addr); /* Reset */
> +#if 1
> +                               /* Restart erase */
> +                               cfi_send_gen_cmd(0xAA,
> cfi->addr_unlock1, chip->start, map, cfi,
> +                                                cfi->device_type,
> NULL);
> +                               cfi_send_gen_cmd(0x55,
> cfi->addr_unlock2, chip->start, map, cfi,
> +                                                cfi->device_type,
> NULL);
> +                               cfi_send_gen_cmd(0x80,
> cfi->addr_unlock1, chip->start, map, cfi,
> +                                                cfi->device_type,
> NULL);
> +                               cfi_send_gen_cmd(0xAA,
> cfi->addr_unlock1, chip->start, map, cfi,
> +                                                cfi->device_type,
> NULL);
> +                               cfi_send_gen_cmd(0x55,
> cfi->addr_unlock2, chip->start, map, cfi,
> +                                                cfi->device_type,
> NULL);
> +                               map_write(map, cfi->sector_erase_cmd,
> chip->in_progress_block_addr);
> +#endif
> +//                             chip->oldstate = FL_READY;
> +//                             chip->state = FL_READY;
> +
>                                 put_chip(map, chip, adr);
> -                               printk(KERN_ERR "MTD %s(): chip not ready
> after erase suspend\n", __func__);
> +                               printk(KERN_ERR "MTD %s(): chip not ready
> after erase suspend, block_addr:0x%lx, "
> +                                      "block_mask:0x%lx, adr:0x%lx,
> suspends:%ld, status:0x%lx \n", __func__,
> +                                      chip->in_progress_block_addr,
> chip->in_progress_block_addr, adr,
> +                                      chip->in_progress_suspends,
> +                                      status.x[0]);
> +
>                                 return -EIO;
>                         }
> 
> @@ -2270,7 +2296,7 @@ static int __xipram do_erase_chip(struct map_info
> *map, struct flchip *chip)
>         chip->erase_suspended = 0;
>         chip->in_progress_block_addr = adr;
>         chip->in_progress_block_mask = ~(map->size - 1);
> -
> +       chip->in_progress_suspends = 0;
>         INVALIDATE_CACHE_UDELAY(map, chip,
>                                 adr, map->size,
>                                 chip->erase_time*500);
> diff --git a/include/linux/mtd/flashchip.h
> b/include/linux/mtd/flashchip.h
> index 3529683f691e..8e7ba8244ced 100644
> --- a/include/linux/mtd/flashchip.h
> +++ b/include/linux/mtd/flashchip.h
> @@ -86,6 +86,7 @@ struct flchip {
>         unsigned int erase_suspended:1;
>         unsigned long in_progress_block_addr;
>         unsigned long in_progress_block_mask;
> +       unsigned long in_progress_suspends;
> 
>         struct mutex mutex;
>         wait_queue_head_t wq; /* Wait on here when we're waiting for the
> chip
> 
> 
> Which printed(when error occurred):
> MTD get_chip(): chip not ready after erase suspend, block_addr:0x3480000,
> block_mask:0x3480000, adr:0x3a7da84,
> suspends:319941, status:0x28
> 
> 
>  Jocke