NAND timeout issues with blank chip and Marvell NFC
Chris Packham
Chris.Packham at alliedtelesis.co.nz
Mon Apr 23 22:31:39 PDT 2018
Hi,
We're in the process of qualifying new NAND chips (Macronix
MX30LF2G18AC) for one of our Armada-385 based devices and we're
experiencing some long startup times on units with factory fresh NAND
chips. Anecdotally I think I've also seen this behaviour on the old
chips as well (Micron MT29F2G08ABAEAWP-ITX:E).
On 4.17.0-rc2 with the newly re-written NAND infrastructure we see
nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xda
nand: Macronix MX30LF2G18AC
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
marvell-nfc f10d0000.flash: Timeout on CMDD (NDSR: 0x00000080)
marvell-nfc f10d0000.flash: Timeout on CMDD (NDSR: 0x00000280)
Bad block table not found for chip 0
Bad block table not found for chip 0
Scanning device for bad blocks
(nothing for some time)
On an older kernel we see
pxa3xx-nand f10d0000.flash: This platform can't do DMA on this device
nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xda
nand: Macronix MX30LF2G18AC
nand: 256 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64
pxa3xx-nand f10d0000.flash: ECC strength 16, ECC step size 2048
Bad block table not found for chip 0
Bad block table not found for chip 0
Scanning device for bad blocks
pxa3xx-nand f10d0000.flash: Wait time out!!!
pxa3xx-nand f10d0000.flash: Wait time out!!!
pxa3xx-nand f10d0000.flash: Wait time out!!!
pxa3xx-nand f10d0000.flash: Wait time out!!!
pxa3xx-nand f10d0000.flash: Wait time out!!!
...
(time outs continue for some time)
Presumably the new driver in 4.17.0-rc2 is experiencing the same wait
time out but just not complaining about it.
If we leave the system running long enough (in the order of 30 minutes)
things seem to sort themselves out and bootup continues, the subsequent
boots are fine. If we run 'nand erase.chip' from u-boot on a fresh unit
and then boot into the kernel then things are also fine.
If we run 'nand scrub.chip -y' from u-boot we are able to re-create the
problem.
Our suspicion is that erased state of the chip is probably not agreeable
with either the ecc data or the bad block table location (or both). By
erasing it from u-boot this must fill in valid data in the expected
places and the kernel is happy.
We could update our manufacturing procedures to run 'nand erase.chip'
before the first boot but this feels wrong. Some of our devices boot
over the network so the nand is not normally touched by the bootloader.
It seems that there is some unhandled error condition that is stopping
the kernel from seeing that the chip is completely blank and making
forward progress.
Has anyone else seen something like this before? Any thoughts as to how
we can avoid the long delay?
Thanks
Chris
More information about the linux-mtd
mailing list