[NAND] Question regarding -EIO error
Lukasz Majewski
lukma at denx.de
Mon Nov 13 12:27:01 PST 2017
Dear All,
I was investigating the -EIO issue for page write from 2.6.26 kernel up
till 4.14-rc7.
A foreword:
-----------
Before the commit (v4.4):
mtd: nand: increase ready wait timeout and report timeouts [1]
b70af9bef49bd9a5f4e7a2327d9074e29653e665
The timeout for nand memory write (nand_page_write()) was ignored (as
mentioned in [1]).
The nand_write_page() (@nand_base.c) only checks for NAND_STATUS_FAIL
(and returns -EIO).
In the old days it also used CONFIG_MTD_NAND_VERIFY_WRITE to check if
correct data is written (if not -EIO was returned immediately).
This was removed with [2]:
"mtd: kill MTD_NAND_VERIFY_WRITE"
657f28f8811c92724db10d18bbbec70d540147d6
The commit:
"mtd: nand_wait: warn if the nand is busy on exit"
f251b8dfdd0721255ea11751cdc282834e43b74e
added WARN_ON() on timeout.
Setup:
-----
I've run mtd_*.ko tests on several kernels and two memories.
With mtd_torture tests (and timeout set to 20ms):
modprobe mtd_torturetest dev=${device} check=1 cycles_count=100 gran=10
forces both memories to timeout (at random execution place) with -EIO
error returned.
Please correct me if I'm wrong:
-------------------------------
With the new kernel (v4.14-rc7) we rely on:
1. Page write timeout increased from 20ms -> 400 ms (as in [1])
2. The WARN_ON() is displayed when we leave nand_wait() with ongoing
NAND controller operation.
3. As written in [2] the correctness of written data is check in upper
layers (fs) -> when memory return no fails, but internal controller
still writes data.
Problem:
--------
Normally to exit nand_wait loop I do read RnB GPIO pin
(chip->dev_ready).
When we got a timeout passed status from one memory is 0x81.
Second one returns no errors (0x80) - but the write data check fails.
According to spec bits 5 and 6 (of status register) are 0 -> Internal
data operation Busy and overall Busy.
The problem here is that we exit nand_wait with NAND memory controller
still being busy. Timeout change[1] from 20ms -> 400ms just 'masked'
this issue.
Question:
---------
Shall not we wait more (@nand_wait) for internal operations to be
finished?
To reproduce:
-------------
Change back the timeout value from 400ms to 20m and run mtd_*.ko tests.
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd at denx.de
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-mtd/attachments/20171113/f0166571/attachment.sig>
More information about the linux-mtd
mailing list