mtd: raw: nand: gpmi-nand data corruption @ v5.10.184

Kegl Rohit keglrohit at gmail.com
Wed Jun 21 10:55:47 PDT 2023


ok, looking at the 5.10.184 gpmi-nand.c:

#define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
(((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)

hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);

and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893

hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
4096));

could be the cause. DIV_ROUND_UP is most likely a division and
busy_timeout_cycles * 4096 a multiplication!

The backport is wrong, because on the 5.10 kernel tree commit
cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
not.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d

=> now in 5.10.184 this line "hw->timing1 ..." is wrong!

 I will test this tomorrow.

On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu at nxp.com> wrote:
>
> On 23/06/21 04:27PM, Kegl Rohit wrote:
> > Hello!
> >
> > Using imx7d and rt stable kernel tree.
> >
> > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> >
> > After reverting the latest patch
> > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > not get corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> >
> > The commit message states the timeout calculation was changed.
> > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > and after the patch (_new):
> >
> > [    0.491534] busy_timeout_cycles_old 4353
> > [    0.491604] busy_timeout_cycles_new 1424705
> > [    0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > [    0.492310] nand: Macronix MX30LF4G28AC
> > [    0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > 2048, OOB size: 112
> > [    0.492488] busy_timeout_cycles_old 4353
> > [    0.492493] busy_timeout_cycles_new 1424705
> > [    0.492863] busy_timeout_cycles_old 2510
> > [    0.492872] busy_timeout_cycles_new 350000
> >
> > The new timeouts are set a lot higher. Higher timeouts should not be
> > an issue. Lower timeouts could be an issue.
> > But because of this high timeouts gpmi-nand is broken for us.
> >
> > For now we simple reverted the change.
> > The new calculations seem to be flaky, a previous "fix backport" was
> > already reverted because of data corruption.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > Any guesses why the high timeout causes issues?
>
> high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> register turns to be 0.
>
> >
> >
> > Thanks in advance!
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/



More information about the linux-mtd mailing list