imx27: No space left to write bad block table

Thu Apr 22 00:29:41 BST 2021

Hi Guillaume,

On Wed, Apr 21, 2021 at 5:44 PM Guillaume Tucker
<guillaume.tucker at collabora.com> wrote:

> Sorry I'm late to the party, was busy with some other kernelci
> issues.  I gather this is being reverted anyway now, but please
> let me know if you still need to check anything.  As far as I can
> tell, there hasn't been any automated bisection landing on this
> commit.

Thanks. Yes, we did the revert in linux-next, but I could not see the
next-20210421 boot log for the imx27-phytec-phycard-s-rdk board to
confirm that the NAND bad block table can be found again.

Thanks for your help

>
> It's generally possible to re-run anything, i.e. make a kernel
> build with a custom patchset and run one given test on any of the
> platforms in KernelCI.  There just isn't any public self-service
> for doing that (yet).
>
> Best wishes,
> Guillaume
>
> >>> I have tested this on an i.MX 6 by simulating two bad BBT blocks by simply
> >>> returning -EIO in nand_erase_nand when the block to be erased is one of
> >>> the
> >>> first two BBT blocks.
> >>>
> >>> I have seen this once on a customer board but were not able to reproduce
> >>> it
> >>> anymore, thus the simulation of the two bad blocks.
> >>>
> >>> Without the patch below new versions of the BBT can no longer be written
> >>> to
> >>> the first two blocks reserved for the BBT but they are still evaluated to
> >>> read
> >>> the BBT from during boot due the lack of a test if these blocks are bad.
> >>> So
> >>> changes to the BBT after these two blocks turn bad are only kept and used
> >>> until the next reboot where again the old version of the two worn blocks
> >>> is
> >>> used as a basis.
> >>>
> >>> I tried to use the same mechanism that is used to identify bad blocks
> >>> during a
> >>> scan for bad blocks. But maybe I missed something there? Or were my
> >>> assumptions wrong in the first place?
> >>
> >> Honestly I don't know what is wrong exactly in this patch.
> >>
> >> We will revert the commit as it clearly breaks something fundamental
> >> and the merge window is too close to adopt a hackish attitude.
> >>
> >> I would propose the following tests with your board:
> >> - Hack the core to allow yourself to access bad blocks from userspace
> >>   for testing purposes.
> >> - With the below commit, you should have the same behavior than
> >>   reported by Fabio.
> >> - Revert the commit.
> >> - Manually change the bad block markers (nanddump, flash_erase,
> >>   nandwrite) to declare the two tables bad. Reboot and observe if there
> >>   are any issues. You can try to work from there.
> >
> > Thanks for the input! I will follow your suggestions and let you guys know my
> > findings.
> >
> > Regards,
> > Stefan
> >
> >>
> >>>> ---8<---
> >>>>
> >>>> commit bd9c9fe2ad04546940f4a9979d679e62cae6aa51
> >>>> Author: Stefan Riedmueller <s.riedmueller at phytec.de>
> >>>> Date:   Thu Mar 25 11:23:37 2021 +0100
> >>>>
> >>>>     mtd: rawnand: bbt: Skip bad blocks when searching for the BBT in
> >>>> NAND
> >>>>
> >>>>     The blocks containing the bad block table can become bad as well. So
> >>>>     make sure to skip any blocks that are marked bad when searching for
> >>>> the
> >>>>     bad block table.
> >>>>
> >>>>     Otherwise in very rare cases where two BBT blocks wear out it might
> >>>>     happen that an obsolete BBT is used instead of a newer available
> >>>>     version.
> >>>>
> >>>>     Signed-off-by: Stefan Riedmueller <s.riedmueller at phytec.de>
> >>>>     Signed-off-by: Miquel Raynal <miquel.raynal at bootlin.com>
> >>>>     Link:
> >>>> https://lore.kernel.org/linux-mtd/20210325102337.481172-1-s.riedmueller@phytec.de
> >>>>
> >>>> diff --git a/drivers/mtd/nand/raw/nand_bbt.c
> >>>> b/drivers/mtd/nand/raw/nand_bbt.c
> >>>> index dced32a126d9..6e25a5ce5ba9 100644
> >>>> --- a/drivers/mtd/nand/raw/nand_bbt.c
> >>>> +++ b/drivers/mtd/nand/raw/nand_bbt.c
> >>>> @@ -525,6 +525,7 @@ static int search_bbt(struct nand_chip *this,
> >>>> uint8_t
> >>>> *buf,
> >>>>  {
> >>>>         u64 targetsize = nanddev_target_size(&this->base);
> >>>>         struct mtd_info *mtd = nand_to_mtd(this);
> >>>> +       struct nand_bbt_descr *bd = this->badblock_pattern;
> >>>>         int i, chips;
> >>>>         int startblock, block, dir;
> >>>>         int scanlen = mtd->writesize + mtd->oobsize;
> >>>> @@ -560,6 +561,10 @@ static int search_bbt(struct nand_chip *this,
> >>>> uint8_t
> >>>> *buf,
> >>>>                         int actblock = startblock + dir * block;
> >>>>                         loff_t offs = (loff_t)actblock << this-
> >>>>> bbt_erase_shift;
> >>>>
> >>>> +                       /* Check if block is marked bad */
> >>>> +                       if (scan_block_fast(this, bd, offs, buf))
> >>>> +                               continue;
> >>>> +
> >>>>                         /* Read first page */
> >>>>                         scan_read(this, buf, offs, mtd->writesize, td);
> >>>>                         if (!check_pattern(buf, scanlen, mtd->writesize,
> >>>> td)) {
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Miquèl
> >>
> >> Thanks,
> >> Miquèl
>