mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
Kegl Rohit
keglrohit at gmail.com
Wed Jun 21 21:46:20 PDT 2023
After reverting the revert :), the data corruption did not happen anymore!
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
On Wed, Jun 21, 2023 at 7:55 PM Kegl Rohit <keglrohit at gmail.com> wrote:
>
> ok, looking at the 5.10.184 gpmi-nand.c:
>
> #define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
> (((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
>
> hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
>
> and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
> https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
>
> hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
> 4096));
>
> could be the cause. DIV_ROUND_UP is most likely a division and
> busy_timeout_cycles * 4096 a multiplication!
>
> The backport is wrong, because on the 5.10 kernel tree commit
> cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
> not.
> https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
>
> => now in 5.10.184 this line "hw->timing1 ..." is wrong!
>
> I will test this tomorrow.
>
> On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu at nxp.com> wrote:
> >
> > On 23/06/21 04:27PM, Kegl Rohit wrote:
> > > Hello!
> > >
> > > Using imx7d and rt stable kernel tree.
> > >
> > > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> > >
> > > After reverting the latest patch
> > > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > > not get corrupted.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> > >
> > > The commit message states the timeout calculation was changed.
> > > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > > and after the patch (_new):
> > >
> > > [ 0.491534] busy_timeout_cycles_old 4353
> > > [ 0.491604] busy_timeout_cycles_new 1424705
> > > [ 0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > > [ 0.492310] nand: Macronix MX30LF4G28AC
> > > [ 0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > > 2048, OOB size: 112
> > > [ 0.492488] busy_timeout_cycles_old 4353
> > > [ 0.492493] busy_timeout_cycles_new 1424705
> > > [ 0.492863] busy_timeout_cycles_old 2510
> > > [ 0.492872] busy_timeout_cycles_new 350000
> > >
> > > The new timeouts are set a lot higher. Higher timeouts should not be
> > > an issue. Lower timeouts could be an issue.
> > > But because of this high timeouts gpmi-nand is broken for us.
> > >
> > > For now we simple reverted the change.
> > > The new calculations seem to be flaky, a previous "fix backport" was
> > > already reverted because of data corruption.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> > >
> > > Any guesses why the high timeout causes issues?
> >
> > high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> > register turns to be 0.
> >
> > >
> > >
> > > Thanks in advance!
> > >
> > > ______________________________________________________
> > > Linux MTD discussion mailing list
> > > http://lists.infradead.org/mailman/listinfo/linux-mtd/
More information about the linux-mtd
mailing list