mtd: raw: nand: gpmi-nand data corruption @ v5.10.184
Kegl Rohit
keglrohit at gmail.com
Wed Jun 21 10:55:47 PDT 2023
ok, looking at the 5.10.184 gpmi-nand.c:
#define BF_GPMI_TIMING1_BUSY_TIMEOUT(v) \
(((v) << BP_GPMI_TIMING1_BUSY_TIMEOUT) & BM_GPMI_TIMING1_BUSY_TIMEOUT)
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(busy_timeout_cycles * 4096);
and then 5.19 (upstream patch source 0fddf9ad06fd9f439f137139861556671673e31c)
https://github.com/gregkh/linux/commit/0fddf9ad06fd9f439f137139861556671673e31c#diff-0dec2fa8640ea2067789c406ab1e42c9805d0d0fc9f70a3a29d17f9311e23ca2L893
hw->timing1 = BF_GPMI_TIMING1_BUSY_TIMEOUT(DIV_ROUND_UP(busy_timeout_cycles,
4096));
could be the cause. DIV_ROUND_UP is most likely a division and
busy_timeout_cycles * 4096 a multiplication!
The backport is wrong, because on the 5.10 kernel tree commit
cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d was reverted and on mainline
not.
https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
=> now in 5.10.184 this line "hw->timing1 ..." is wrong!
I will test this tomorrow.
On Wed, Jun 21, 2023 at 5:26 PM han.xu <han.xu at nxp.com> wrote:
>
> On 23/06/21 04:27PM, Kegl Rohit wrote:
> > Hello!
> >
> > Using imx7d and rt stable kernel tree.
> >
> > After upgrading to v5.10.184-rt90 the rootfs ubifs mtd partition got corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tag/?h=v5.10.184-rt90
> >
> > After reverting the latest patch
> > (e4e4b24b42e710db058cc2a79a7cf16bf02b4915), the rootfs partition did
> > not get corrupted.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=e4e4b24b42e710db058cc2a79a7cf16bf02b4915
> >
> > The commit message states the timeout calculation was changed.
> > Here are the calculated timeouts `busy_timeout_cycles` before (_old)
> > and after the patch (_new):
> >
> > [ 0.491534] busy_timeout_cycles_old 4353
> > [ 0.491604] busy_timeout_cycles_new 1424705
> > [ 0.492300] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xdc
> > [ 0.492310] nand: Macronix MX30LF4G28AC
> > [ 0.492316] nand: 512 MiB, SLC, erase size: 128 KiB, page size:
> > 2048, OOB size: 112
> > [ 0.492488] busy_timeout_cycles_old 4353
> > [ 0.492493] busy_timeout_cycles_new 1424705
> > [ 0.492863] busy_timeout_cycles_old 2510
> > [ 0.492872] busy_timeout_cycles_new 350000
> >
> > The new timeouts are set a lot higher. Higher timeouts should not be
> > an issue. Lower timeouts could be an issue.
> > But because of this high timeouts gpmi-nand is broken for us.
> >
> > For now we simple reverted the change.
> > The new calculations seem to be flaky, a previous "fix backport" was
> > already reverted because of data corruption.
> > https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/commit/drivers/mtd/nand/raw/gpmi-nand/gpmi-nand.c?h=v5.10.184-rt90&id=cc5ee0e0eed0bec2b7cc1d0feb9405e884eace7d
> >
> > Any guesses why the high timeout causes issues?
>
> high timeout with wrong calculation may overflow and causes DEVICE_BUSY_TIMEOUT
> register turns to be 0.
>
> >
> >
> > Thanks in advance!
> >
> > ______________________________________________________
> > Linux MTD discussion mailing list
> > http://lists.infradead.org/mailman/listinfo/linux-mtd/
More information about the linux-mtd
mailing list