imx27: No space left to write bad block table
Stefan Riedmüller
S.Riedmueller at phytec.de
Mon May 10 09:38:23 BST 2021
Hi Miquel,
On Tue, 2021-05-04 at 10:34 +0200, Miquel Raynal wrote:
> Hi Stefan,
>
> Stefan Riedmüller <S.Riedmueller at phytec.de> wrote on Mon, 26 Apr 2021
> 15:53:39 +0000:
>
> > Hi Miquel,
> >
> > On Mon, 2021-04-19 at 17:36 +0200, Miquel Raynal wrote:
> > > Hi Stefan,
> > >
> > > > > Interesting. Maybe I overlooked the below commit when applying.
> > > > > Indeed,
> > > > > BBT may be considered as bad blocks, so I wonder if the below change
> > > > > is
> > > > > valid now...
> > > > >
> > > > > Guillaume, would you have a way to revert this patch on top of
> > > > > linux-next? Stefan, would you mind giving more details on the
> > > > > testing
> > > > > procedure?
> > > >
> > > > I have tested this on an i.MX 6 by simulating two bad BBT blocks by
> > > > simply
> > > > returning -EIO in nand_erase_nand when the block to be erased is one
> > > > of
> > > > the
> > > > first two BBT blocks.
> > > >
> > > > I have seen this once on a customer board but were not able to
> > > > reproduce
> > > > it
> > > > anymore, thus the simulation of the two bad blocks.
> > > >
> > > > Without the patch below new versions of the BBT can no longer be
> > > > written
> > > > to
> > > > the first two blocks reserved for the BBT but they are still evaluated
> > > > to
> > > > read
> > > > the BBT from during boot due the lack of a test if these blocks are
> > > > bad.
> > > > So
> > > > changes to the BBT after these two blocks turn bad are only kept and
> > > > used
> > > > until the next reboot where again the old version of the two worn
> > > > blocks
> > > > is
> > > > used as a basis.
> > > >
> > > > I tried to use the same mechanism that is used to identify bad blocks
> > > > during a
> > > > scan for bad blocks. But maybe I missed something there? Or were my
> > > > assumptions wrong in the first place?
> > >
> > > Honestly I don't know what is wrong exactly in this patch.
> > >
> > > We will revert the commit as it clearly breaks something fundamental
> > > and the merge window is too close to adopt a hackish attitude.
> > >
> > > I would propose the following tests with your board:
> > > - Hack the core to allow yourself to access bad blocks from userspace
> > > for testing purposes.
> > > - With the below commit, you should have the same behavior than
> > > reported by Fabio.
> >
> > On my imx6 board the patch does not lead to the behavior reported by
> > Fabio.
> > The BBT is found and can be read:
> >
> > [ 1.520501] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xd3
> > [ 1.526944] nand: Macronix MX60LF8G18AC
> > [ 1.530803] nand: 1024 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > [ 1.539412] Bad block table found at page 524224, version 0x01
> > [ 1.545790] Bad block table found at page 524160, version 0x01
> > [ 1.551796] nand_read_bbt: bad block at 0x000001b60000
> > [ 1.557032] nand_read_bbt: bad block at 0x000008cc0000
> > [ 1.562204] nand_read_bbt: bad block at 0x00000f480000
> > [ 1.567395] nand_read_bbt: bad block at 0x0000111c0000
> > [ 1.572588] nand_read_bbt: bad block at 0x0000205c0000
> > [ 1.577802] nand_read_bbt: bad block at 0x00002dfc0000
> >
> > I dug a little deeper and I think I found the cause for the failure on the
> > imx27 board.
> >
> > The mxc_nand driver (used by the imx27) uses its own nand_bbt_descr with
> > an
> > offset of 0 in the OOB area. This is the same place the bad block marker
> > is
> > located on worn or factory bad blocks.
> >
> > This explains why the BBT is no longer found with my patch.
> > scan_block_fast
> > checks if there is anything else than 0xff in the bad block marker and
> > finds
> > the 'B' from 'Bbt0'. The same occurs for the mirrored version where it
> > finds
> > the '1' from '1tbB'.
>
> Ok, that's the reason why the original logic failed, thanks for looking
> for it.
>
> > This also explains why the original BBT is detected as bad blocks in the
> > scan
> > after the BBT was not found, which results in the BBT being written to the
> > remaining two blocks reserved for the BBT.
> >
> > 19:38:23.001385 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > 19:38:23.002635 nand: ST Micro NAND01GR3B2CZA6
> > 19:38:23.006666 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > 19:38:23.028413 Bad block table not found for chip 0
> > 19:38:23.035625 random: fast init done
> > 19:38:23.049144 Bad block table not found for chip 0
> > 19:38:23.050024 Scanning device for bad blocks
> > 19:38:23.330999 Bad eraseblock 329 at 0x000002920000
> > 19:38:23.345958 Bad eraseblock 330 at 0x000002940000
> > 19:38:23.356024 Bad eraseblock 331 at 0x000002960000
> > 19:38:23.365738 Bad eraseblock 332 at 0x000002980000
> > 19:38:23.375590 Bad eraseblock 333 at 0x0000029a0000
> > 19:38:23.385505 Bad eraseblock 334 at 0x0000029c0000
> > 19:38:23.395548 Bad eraseblock 335 at 0x0000029e0000
> > 19:38:23.405501 Bad eraseblock 336 at 0x000002a00000
> > 19:38:23.415551 Bad eraseblock 337 at 0x000002a20000
> > 19:38:23.425937 Bad eraseblock 338 at 0x000002a40000
> > 19:38:23.436028 Bad eraseblock 339 at 0x000002a60000
> > 19:38:23.445959 Bad eraseblock 340 at 0x000002a80000
> > 19:38:23.456008 Bad eraseblock 341 at 0x000002aa0000
> > 19:38:23.466006 Bad eraseblock 342 at 0x000002ac0000
> > 19:38:23.475912 Bad eraseblock 343 at 0x000002ae0000
> > 19:38:23.486064 Bad eraseblock 344 at 0x000002b00000
> > 19:38:23.495925 Bad eraseblock 345 at 0x000002b20000
> > 19:38:24.048053 Bad eraseblock 1022 at 0x000007fc0000
> > 19:38:24.056117 Bad eraseblock 1023 at 0x000007fe0000
> > 19:38:24.067953 Bad block table written to 0x000007fa0000, version 0x01
> > 19:38:24.087637 Bad block table written to 0x000007f80000, version 0x01
> >
> >
> > On the next boot all four BBT version in flash are skipped for the same
> > reason
> > as before and the two blocks containing the latest BBT are also detected
> > as
> > bad blocks. The result is no more remaining blocks to write the BBT to.
> >
> >
> > 21:22:55.032595 nand: device found, Manufacturer ID: 0x20, Chip ID: 0xa1
> > 21:22:55.033333 nand: ST Micro NAND01GR3B2CZA6
> > 21:22:55.037804 nand: 128 MiB, SLC, erase size: 128 KiB, page size: 2048,
> > OOB
> > size: 64
> > 21:22:55.088475 Bad block table not found for chip 0
> > 21:22:55.093807 Bad block table not found for chip 0
> > 21:22:55.105995 Scanning device for bad blocks
> > 21:22:55.109049 random: fast init done
> > 21:22:55.395488 Bad eraseblock 329 at 0x000002920000
> > 21:22:55.406832 Bad eraseblock 330 at 0x000002940000
> > 21:22:55.416885 Bad eraseblock 331 at 0x000002960000
> > 21:22:55.426736 Bad eraseblock 332 at 0x000002980000
> > 21:22:55.436732 Bad eraseblock 333 at 0x0000029a0000
> > 21:22:55.446864 Bad eraseblock 334 at 0x0000029c0000
> > 21:22:55.456662 Bad eraseblock 335 at 0x0000029e0000
> > 21:22:55.466785 Bad eraseblock 336 at 0x000002a00000
> > 21:22:55.476801 Bad eraseblock 337 at 0x000002a20000
> > 21:22:55.486772 Bad eraseblock 338 at 0x000002a40000
> > 21:22:55.496768 Bad eraseblock 339 at 0x000002a60000
> > 21:22:55.506607 Bad eraseblock 340 at 0x000002a80000
> > 21:22:55.516965 Bad eraseblock 341 at 0x000002aa0000
> > 21:22:55.526621 Bad eraseblock 342 at 0x000002ac0000
> > 21:22:55.536702 Bad eraseblock 343 at 0x000002ae0000
> > 21:22:55.546660 Bad eraseblock 344 at 0x000002b00000
> > 21:22:55.556745 Bad eraseblock 345 at 0x000002b20000
> > 21:22:56.172928 Bad eraseblock 1020 at 0x000007f80000
> > 21:22:56.187043 Bad eraseblock 1021 at 0x000007fa0000
> > 21:22:56.197437 Bad eraseblock 1022 at 0x000007fc0000
> > 21:22:56.212665 Bad eraseblock 1023 at 0x000007fe0000
> > 21:22:56.213356 No space left to write bad block table
> > 21:22:56.215012 nand_bbt: error while writing bad block table -28
> > 21:22:56.239353 mxc_nand: probe of d8000000.nand-controller failed with
> > error
> > -28
> >
> > I'm not sure of the best way to address this issue. A few ideas came into
> > my
> > mind:
> >
> > - Shift the offset of the nand_bbt_descr of mxc_nand to make room for the
> > bad
> > block marker. But I'm not sure if this would already conflict with the ECC
> > hardware but the ooblayout functions would suggest that it could work.
>
> There are thousands of boards out there that would be broken with such
> change: it's too late to do changes in this driver, unfortunately.
>
> > Unfortunately I don't have any hardware at hand at the moment to test it.
> > I
> > think the distinction between small and large pagesizes needs to be
> > reflected
> > on the bbt_descr as well.
> >
> > - Use NAND_BBT_NO_OOB with the mxc_nand driver since there is a comment
> > saying
> > there is an overlap between the generic bbt descriptors and the ECC
> > hardware.
> > I'm not sure what other effects it might have to set NAND_BBT_NO_OOB.
>
> Same here: that's not an option.
>
> > - Explicitly check for the bad block marker during a search for the BBT
> > instead of using scan_block_fast
>
> This look more reasonable. You can create a helper which does the
> scan_block_fast(), then eventually checks the beginning of the OOB
> buffer and tries to match with the ->td and ->md descriptors. This
> should work with all the legacy drivers implementing their own
> descriptors - hopefully.
Thanks for your input. I will take another spin at it.
>
> Other drivers are impacted as well, so maybe you'll find a board for
> testing (or someone gentle enough that will test it for you).
I hope I'll get my hands at least on one of the imx27 boards.
Thanks,
Stefan
>
> Thanks,
> Miquèl
More information about the linux-mtd
mailing list