raw/omap2: erasing issue

Roger Quadros rogerq at kernel.org
Sat Jul 9 23:52:13 PDT 2022


Hello Yegor,

On 05/07/2022 17:46, Yegor Yefremov wrote:
> Hi Roger,
> 
> On Mon, Jul 4, 2022 at 12:31 PM Yegor Yefremov
> <yegorslists at googlemail.com> wrote:
>>
>> Hi Roger,
>>
>> On Mon, Jul 4, 2022 at 12:28 PM Roger Quadros <rogerq at kernel.org> wrote:
>>>
>>> Hello Yegor,
>>>
>>> On 04/07/2022 14:28, Yegor Yefremov wrote:
>>>> Hi Roger,
>>>>
>>>> On Thu, Jun 30, 2022 at 1:22 PM Roger Quadros <rogerq at kernel.org> wrote:
>>>>>
>>>>> Hi Yegor,
>>>>>llo 
>>>>> On 29/06/2022 17:23, Yegor Yefremov wrote:
>>>>>> Hi Roger,
>>>>>>
>>>>>> On Wed, Jun 29, 2022 at 3:44 PM Roger Quadros <rogerq at kernel.org> wrote:
>>>>>>>
>>>>>>> Hi Yegor,
>>>>>>>
>>>>>>> On 29/06/2022 14:33, Roger Quadros wrote:
>>>>>>>> Hi Yegor,
>>>>>>>>
>>>>>>>> On 28/06/2022 14:59, Yegor Yefremov wrote:
>>>>>>>>> On Tue, Jun 28, 2022 at 1:57 PM Yegor Yefremov
>>>>>>>>> <yegorslists at googlemail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Roger,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 28, 2022 at 1:44 PM Roger Quadros <rogerq at kernel.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Yegor,
>>>>>>>>>>>
>>>>>>>>>>> On 28/06/2022 13:48, Yegor Yefremov wrote:
>>>>>>>>>>>> Since linux 5.17 I get the following issue when doing ubiformat:
>>>>>>>>>>>>
>>>>>>>>>>>> # ubiformat -y /dev/mtd5
>>>>>>>>>>>> ubiformat: mtd5 (nand), size 265945088 bytes (253.6 MiB), 2029
>>>>>>>>>>>> eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes
>>>>>>>>>>>> libscan: scanning eraseblock 1097 -- 54 % complete  eth1 timed out to bring up
>>>>>>>>>>>> libscan: scanning eraseblock 2028 -- 100 % complete
>>>>>>>>>>>> ubiformat: 2001 eraseblocks have valid erase counter, mean value is 9
>>>>>>>>>>>> ubiformat: 2 eraseblocks are supposedly empty
>>>>>>>>>>>> ubiformat: 26 bad eraseblocks found, numbers: 3, 4, 5, 6, 8, 9, 10,
>>>>>>>>>>>> 11, 13, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 27, 29, 30,
>>>>>>>>>>>> 31, 32
>>>>>>>>>>>
>>>>>>>>>>> I'm guessing these bad blocks recently added due to the offending patch?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>>> ubiformat: formatting eras[   33.644323] nand: nand_erase_nand:
>>>>>>>>>>>> attempt to erase a bad block at page 0x00000d40
>>>>>>>>>>>> ubiformat: formatting eraseblock 28[   33.658809] nand:
>>>>>>>>>>>> nand_erase_nand: attempt to erase a bad block at page 0x00000d80
>>>>>>>>>>>> ubiformat: formatting eraseblock 29 --  1 % [   33.674531] nand:
>>>>>>>>>>>> nand_erase_nand: attempt to erase a bad block at page 0x00000dc0
>>>>>>>>>>>> ubiformat: formatting eraseblock 30 --  1 % complete [   33.684508]
>>>>>>>>>>>> nand: nand_erase_nand: attempt to erase a bad block at page 0x00000e00
>>>>>>>>>>>> ubiformat: formatting eraseblock 34 --  1 % complete  libmtd: error!:
>>>>>>>>>>>> MEMERASE64 ioctl failed for eraseblock 34 (mtd5)
>>>>>>>>>>>>         error 5 (Input/output error)
>>>>>>>>>>>>
>>>>>>>>>>>> ubiformat: error!: failed to erase eraseblock 34
>>>>>>>>>>>>            error 5 (Input/output error)
>>>>>>>>>>>> ubiformat: marking block 34 bad
>>>>>>>>>>>> ubiformat: formatting eraseblock 35 --  1 % complete  libmtd: error!:
>>>>>>>>>>>> MEMERASE64 ioctl failed for eraseblock 35 (mtd5)
>>>>>>>>>>>>         error 5 (Input/output error)
>>>>>>>>>>>>
>>>>>>>>>>>> ubiformat: error!: failed to erase eraseblock 35
>>>>>>>>>>>>            error 5 (Input/output error)
>>>>>>>>>>>> ubiformat: marking block 35 bad
>>>>>>>>>>>> ubiformat: formatting eraseblock 36 --  1 % complete  libmtd: error!:
>>>>>>>>>>>> MEMERASE64 ioctl failed for eraseblock 36 (mtd5)
>>>>>>>>>>>>         error 5 (Input/output error)
>>>>>>>>>>>>
>>>>>>>>>>>> ubiformat: error!: failed to erase eraseblock 36
>>>>>>>>>>>>            error 5 (Input/output error)
>>>>>>>>>>>> ubiformat: marking block 36 bad
>>>>>>>>>>>> ubiformat: formatting eraseblock 37 --  1 % complete  libmtd: error!:
>>>>>>>>>>>> MEMERASE64 ioctl failed for eraseblock 37 (mtd5)
>>>>>>>>>>>>         error 5 (Input/output error)
>>>>>>>>>>>>
>>>>>>>>>>>> ubiformat: error!: failed to erase eraseblock 37
>>>>>>>>>>>>            error 5 (Input/output error)
>>>>>>>>>>>> ubiformat: marking block 37 bad
>>>>>>>>>>>>
>>>>>>>>>>>> ubiformat: error!: consecutive bad blocks exceed limit: 4, bad flash?
>>>>>>>>>>>> # [   36.322563] vwl1271: disabling
>>>>>>>>>>>>
>>>>>>>>>>>> git bisect pointed to the following commit:
>>>>>>>>>>>>
>>>>>>>>>>>> a9e849efca4f9c7732ea4a81f13ec96208994b22 is the first bad commit
>>>>>>>>>>>> commit a9e849efca4f9c7732ea4a81f13ec96208994b22
>>>>>>>>>>>> Author: Roger Quadros <rogerq at kernel.org>
>>>>>>>>>>>> Date:   Thu Dec 9 11:04:55 2021 +0200
>>>>>>>>>>>>
>>>>>>>>>>>>     mtd: rawnand: omap2: move to exec_op interface
>>>>>>>>>>>>
>>>>>>>>>>>>     Stop using legacy interface and move to the exec_op interface.
>>>>>>>>>>>>
>>>>>>>>>>>>     Signed-off-by: Roger Quadros <rogerq at kernel.org>
>>>>>>>>>>>>     Signed-off-by: Miquel Raynal <miquel.raynal at bootlin.com>
>>>>>>>>>>>>     Link: https://lore.kernel.org/linux-mtd/20211209090458.24830-4-rogerq@kernel.org
>>>>>>>>>>>>
>>>>>>>>>>>> :040000 040000 2341051b8aa8e6b554b8a44d2934f76d1aa460c4
>>>>>>>>>>>> c1727080ff16c403f4ad5ed840acc90127b632f8 M      drivers
>>>>>>>>>>>>
>>>>>>>>>>>> Info to my NAND flash:
>>>>>>>>>>>>
>>>>>>>>>>>> [    5.695760] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xda
>>>>>>>>>>>> [    5.702193] nand: Micron MT29F2G08ABAEAWP
>>>>>>>>>>>> [    5.706356] nand: 256 MiB, SLC, erase size: 128 KiB, page size:
>>>>>>>>>>>> 2048, OOB size: 64
>>>>>>>>>>>> [    5.714204] nand: using OMAP_ECC_BCH8_CODE_HW ECC scheme
>>>>>>>>>>>> [    5.719673] 6 cmdlinepart partitions found on MTD device omap2-nand.0
>>>>>>>>>>>> [    5.726232] Creating 6 MTD partitions on "omap2-nand.0":
>>>>>>>>>>>> [    5.731594] 0x000000000000-0x000000020000 : "SPL"
>>>>>>>>>>>> [    5.737788] mtdblock: MTD device 'SPL' is NAND, please consider
>>>>>>>>>>>> using UBI block devices instead.
>>>>>>>>>>>> [    5.750113] 0x000000020000-0x000000040000 : "SPL.backup1"
>>>>>>>>>>>> [    5.756916] mtdblock: MTD device 'SPL.backup1' is NAND, please
>>>>>>>>>>>> consider using UBI block devices instead.
>>>>>>>>>>>> [    5.769870] 0x000000040000-0x000000060000 : "SPL.backup2"
>>>>>>>>>>>> [    5.776695] mtdblock: MTD device 'SPL.backup2' is NAND, please
>>>>>>>>>>>> consider using UBI block devices instead.
>>>>>>>>>>>> [    5.789559] 0x000000060000-0x000000080000 : "SPL.backup3"
>>>>>>>>>>>> [    5.796423] mtdblock: MTD device 'SPL.backup3' is NAND, please
>>>>>>>>>>>> consider using UBI block devices instead.
>>>>>>>>>>>> [    5.809341] 0x000000080000-0x000000260000 : "u-boot"
>>>>>>>>>>>> [    5.816652] mtdblock: MTD device 'u-boot' is NAND, please consider
>>>>>>>>>>>> using UBI block devices instead.
>>>>>>>>>>>> [    5.829189] 0x000000260000-0x000010000000 : "UBI"
>>>>>>>>>>>> [    5.971508] mtdblock: MTD device 'UBI' is NAND, please consider
>>>>>>>>>>>> using UBI block devices instead.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> What platform are you on?
>>>>>>>>>>> I do remember testing this on omap3-beagle but it does not use BCH8 ECC scheme.
>>>>>>>>>>
>>>>>>>>>> I am on am335x [1]
>>>>>>>>>>
>>>>>>>>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/am335x-baltos-ir5221.dts?h=v5.19-rc4
>>>>>>>>>
>>>>>>>>> NAND node definition [1]:
>>>>>>>>>
>>>>>>>>> &gpmc {
>>>>>>>>> pinctrl-names = "default";
>>>>>>>>> pinctrl-0 = <&nandflash_pins_s0>;
>>>>>>>>> ranges = <0 0 0x08000000 0x10000000>; /* CS0: NAND */
>>>>>>>>> status = "okay";
>>>>>>>>>
>>>>>>>>> nand at 0,0 {
>>>>>>>>> compatible = "ti,omap2-nand";
>>>>>>>>> reg = <0 0 4>; /* CS0, offset 0, IO size 4 */
>>>>>>>>> interrupt-parent = <&gpmc>;
>>>>>>>>> interrupts = <0 IRQ_TYPE_NONE>, /* fifoevent */
>>>>>>>>>     <1 IRQ_TYPE_NONE>; /* termcount */
>>>>>>>>> rb-gpios = <&gpmc 0 GPIO_ACTIVE_HIGH>; /* gpmc_wait0 */
>>>>>>>>> nand-bus-width = <8>;
>>>>>>>>> ti,nand-ecc-opt = "bch8";
>>>>>>>>> ti,nand-xfer-type = "polled";
>>>>>>>>
>>>>>>>> Could you please change this to "prefetch-polled" and see if it fixes the issue?
>>>>>>>>
>>>>>>>
>>>>>>> I tried to set ti,nand-xfer-type to "polled" on beagle-c4 board and could not reproduce the issue
>>>>>>> I will need your help please to debug this issue.
>>>>>>>
>>>>>>> Could you please apply the below patch on top of commit a9e849efca4f9c7732ea4a81f13ec96208994b22
>>>>>>> and send me the full kernel log and output of ubiformat command?
>>>>>>
>>>>>> I'll post the data later.
>>>>>>
>>>>>> The test with the "prefetch-polled" setting looks promising:
>>>>>>
>>>>>> 1. ubiformat runs without issues
>>>>>> 2. I can boot from NAND after "cat MLO > /dev/mtdblock0", etc.
>>>>>> 3. the kernel can mount UBIFS as rootfs
>>>>>>
>>>>>> The only issue I have for now, is that barebox fails to correctly
>>>>>> mount the first partition (the second with UBIFS rootfs - no problem).
>>>>>> This is how I write to NAND:
>>>>>>
>>>>>> ubiformat -y /dev/mtd5
>>>>>> ubiattach -p /dev/mtd5
>>>>>> ubimkvol /dev/ubi0 -N kernel -s 56MiB
>>>>>> mount -t ubifs ubi0:kernel /mnt
>>>>>> cp kernel-fit.itb /mnt
>>>>>> umount /mnt
>>>>>> ubimkvol /dev/ubi0 -N rootfs -s 180MiB
>>>>>> ubiupdatevol /dev/ubi0_1 rootfs.ubifs
>>>>>>
>>>>>> barebox log:
>>>>>>
>>>>>> Booting from NAND
>>>>>> ubi0: scanning is finished
>>>>>> ubi0: registering /dev/nand0.UBI.ubi
>>>>>> ubi0: registering kernel as /dev/nand0.UBI.ubi.kernel
>>>>>> ubi0: registering rootfs as /dev/nand0.UBI.ubi.rootfs
>>>>>> ubi0: attached mtd0 (name "nand0.UBI", size 253 MiB) to ubi0
>>>>>> ubi0: PEB size: 131072 bytes (128 KiB), LEB size: 129024 bytes
>>>>>> ubi0: min./max. I/O unit sizes: 2048/2048, sub-page size 512
>>>>>> ubi0: VID header offset: 512 (aligned 512), data offset: 2048
>>>>>> ubi0: good PEBs: 1999, bad PEBs: 30, corrupted PEBs: 0
>>>>>
>>>>> Note that we now have 30 bad PEBs. I suppose these are not
>>>>> really bad and we need to somehow clear bad block status for these.
>>>>
>>>> Do you mean using u-boot's "nand scrab"? So far, I didn't found any
>>>> other option. There are numerous threads both mtd and barebox mailing
>>>> lists but no implementation.
>>>>
>>>> Unfortunately, I don't have the initial BBT info. So let's hope the
>>>> system can handle this.
>>>
>>>
>>> "nand scrub" will mark all sectors not-bad so doesn't look like the best option.
>>> I was wondering if there is a better way to selectively mark individual sectors not bad.
>>
>> Haven't found anything suitable so far.
>>
>>>>
>>>> Btw, I have applied your debug patch and executed a ubiformat command
>>>> but the debug messages weren't triggered.
>>>
>>> That is because you no longer see errors during nand erase. Did you try
>>> going back to ti,nand-xfer-type = "polled" ?
>>
>> I have applied the patch to a9e849efca4f9c7732ea4a81f13ec96208994b22
>> and at that time our DTS still has xfer type as "polled" and ubiformat
>> command failed as expected.
> 
> I think the issue is solved. The bootloader was actually complaining
> about the missing zstd support. I could see this with the latest
> barebox version (2022.06).
> 
> I've also switched to "ti,nand-xfer-type = "prefetch-dma";" as other DTS do.

Just to conclude,
1) Barebox issue was barebox configuration related.
2) NAND erase issue was fixed by switching to "prefetch-dma" or "prefetch-polled"

Does the issue still happen with "polled"? If yes it might be due to too less
GPMC timing value for Read/Busy signalling.

Can you please send a patch with the fix? Thanks!

> 
> Thanks for your help.
> 
> Yegor

cheers,
-roger



More information about the linux-mtd mailing list