mtd: raw: nand: gpmi-nand data corruption @ v5.10.184

Miquel Raynal miquel.raynal at bootlin.com
Mon Jun 26 03:56:12 PDT 2023


Hi Kegl,

keglrohit at gmail.com wrote on Sun, 25 Jun 2023 11:11:52 +0200:

> Hello!
> 
> Following to the initial discussion
> https://lore.kernel.org/all/20220701110341.3094023-1-s.hauer@pengutronix.de
> which caused the revert commit:
> Are there any plans to fix this issue for 5.10.y (and maybe other
> stable branches)?

If the fixes tags are right, all relevant branches which are still
maintained should see the final fix applied. If that's not the case, it
means the stable maintainers could not apply the patch as-is and let it
aside. You are pleased in this case to adapt the official patch to
the branch(es) of interest and send it to the stable team by mentioning
the upstream commit (see the documentation about how to ask for
backporting patches on stable branches).

Thanks,
Miquèl

> 
> Thanks in advance!
> 
> On Thu, Jun 22, 2023 at 6:46 AM Kegl Rohit <keglrohit at gmail.com> wrote:
> >
> > After reverting the revert :), the data corruption did not happen anymore!
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit at gmail.com> wrote:  
> > >
> > > ok, looking at the 5.10.184 gpmi-nand.c:
> > >
> > > #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> > > (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
> > >
> > > and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> > > https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
> > >
> > > hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> > > 4096));
> > >
> > > could be the cause. DIV_ROUND_UP is most likely a division and
> > > busy_timeout_cycles * 4096 a multiplication!
> > >
> > > The backport is wrong, because on the 5.10 kernel tree commit
> > > cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> > > not.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > >  
> > > => now in 5.10.184 this line "hw->timing1 ..." is wrong!  
> > >
> > >  I will test this tomorrow.
> > >
> > > On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu at nxp.com> wrote:  
> > > >
> > > > On 23/06/21 04:27PM, Kegl Rohit wrote:  
> > > > > Hello!
> > > > >
> > > > > Using imx7d and rt stable kernel tree.
> > > > >
> > > > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > > > >
> > > > > After reverting the latest patch
> > > > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > > > not get corrupted.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > > > >
> > > > > The commit message states the timeout calculation was changed.
> > > > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > > > and after the patch (_new):
> > > > >
> > > > > [    0.491534] busy_timeout_cycles_old 4353
> > > > > [    0.491604] busy_timeout_cycles_new 1424705
> > > > > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > > > [    0.492310] nand: Macronix MX30LF4G28AC
> > > > > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > > > 2048, OOB size: 112
> > > > > [    0.492488] busy_timeout_cycles_old 4353
> > > > > [    0.492493] busy_timeout_cycles_new 1424705
> > > > > [    0.492863] busy_timeout_cycles_old 2510
> > > > > [    0.492872] busy_timeout_cycles_new 350000
> > > > >
> > > > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > > > an issue. Lower timeouts could be an issue.
> > > > > But because of this high timeouts gpmi-nand is broken for us.
> > > > >
> > > > > For now we simple reverted the change.
> > > > > The new calculations seem to be flaky, a previous "fix backport" was
> > > > > already reverted because of data corruption.
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > > > >
> > > > > Any guesses why the high timeout causes issues?  
> > > >
> > > > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > > > register turns to be 0.
> > > >  
> > > > >
> > > > >
> > > > > Thanks in advance!
> > > > >
> > > > > ______________________________________________________
> > > > > Linux MTD discussion mailing list
> > > > > http://lists.infradead.org/mailman/listinfo/linux-mtd/  



More information about the linux-mtd mailing list