[BUG] pxa3xx: wait time out when scanning for bb

Miquel RAYNAL miquel.raynal at free-electrons.com
Fri Dec 1 00:15:39 PST 2017


> >>>>>> (I only see the timeouts if I remove the nand-on-flash-bbt)  
> >>>>> The nand-on-flash-bbt will read some of the last pages in you
> >>>>> NAND chip where a bad block table is supposed to be and derive
> >>>>> from that whether a block is bad or not. So this does only one
> >>>>> read. I guess you should have at least one timeout there?  
> >>>> Maybe, but the flash is fine we are running a rootfs in the NAND
> >>>> chip.  
> >>> So you can safely use the content of the NAND chip? Without any
> >>> timeout neither with reads nor writes? Can you try the mtd-utils
> >>> from [5]: nanddump/nandwrite or nandpagetest?
> >>>
> >>> Also, can you isolate the line that produces the timeouts?
> >>>
> >>> [5]http://www.linux-mtd.infradead.org/  
> >> Yes the NAND chip is working fine and stores our data.
> >>
> >> It is the command NAND_CMD_READOOB that causes it to timeout.  
> > Ok, I had a look at the nand_cmdfunc() function which is, I suppose,
> > the one that is in use (because you are using 2k pages) but I could
> > not see anything obvious. Is your setup special in some way?  
> Yes it's nand_cmdfunc()
> No a clean 4.14.0 kernel with a custom dts.
> >
> > Could you enable dynamic debug by adding "#define DEBUG" *before*
> > all #includes at the top of the pxa3xx_nand.c driver? It should
> > display all register accesses. Also, can you read the content of
> > NDCR and NDSR when it timeouts?
> >  
> [   32.765604] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():605 
> nand_writel(0x1, 0x0028)
> [   32.765609] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():625 
> nand_writel(0xfff, 0x0014)
> [   32.765614] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():626 
> nand_writel(0x0, 0x0000)
> [   32.765620] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():627 
> nand_writel(0xd1078000, 0x0000)

This is a write command request.

> [   32.765627] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x1
> [   32.765632] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():874 
> nand_writel(0x1, 0x0014)

The command is ready to be written in NDCB* registers (0x48).

> [   32.765637] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():888 
> nand_writel(0xd3000, 0x0048)
> [   32.765643] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():889 
> nand_writel(0x2060000, 0x0048)
> [   32.765648] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():890 
> nand_writel(0x0, 0x0048)
> [   32.765653] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():894 
> nand_writel(0x0, 0x0048)

"Command" registers are set:
- READ0/READSTART commands (double byte command)
- 5 address cycles: column is 0, page is 0x206 which is weird if this
  is a READOOB operation, where column should be something like 0x800
  (mtd->writesize).

> [   32.765677] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x800
> [   32.765682] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():874 
> nand_writel(0x800, 0x0014)
> [   32.765797] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x2
> [   32.765886] pxa3xx-nand f10d0000.flash:
> pxa3xx_nand_irq_thread():804 nand_writel(0x6, 0x0014)

Read data request received, the FIFO may be drawn.

> [   32.765893] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x500
> [   32.765899] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():874 
> nand_writel(0x500, 0x0014)
> [   32.765950] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():609 
> nand_writel(0x0, 0x0028)
> [   32.765956] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():625 
> nand_writel(0xfff, 0x0014)

Command done received, it means data was read correctly.

And this is the start of another "action".

> [   32.765961] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():626 
> nand_writel(0x0, 0x0000)
> [   32.765966] pxa3xx-nand f10d0000.flash: pxa3xx_nand_start():627 
> nand_writel(0x91078000, 0x0000)
> [   32.765974] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x1
> [   32.765979] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():874 
> nand_writel(0x1, 0x0014)

Same as before, command is ready to be written, single difference is
the use of the HW ECC engine. But, a few lines earlier, 0 was written
to NDECCCTRL (0x28), disabling BCH, which is weird because there we
will do an operation under Hamming ECC engine.

> [   32.765984] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():888 
> nand_writel(0xd3000, 0x0048)
> [   32.765989] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():889 
> nand_writel(0x2060000, 0x0048)
> [   32.765994] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():890 
> nand_writel(0x0, 0x0048)
> [   32.766000] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():894 
> nand_writel(0x0, 0x0048)

Same read operation as before.

> [   32.766022] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x800
> [   32.766028] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():874 
> nand_writel(0x800, 0x0014)
> [   32.766143] pxa3xx-nand f10d0000.flash: pxa3xx_nand_irq():824 
> nand_readl(0x0014) = 0x2
> [   32.766233] pxa3xx-nand f10d0000.flash:
> pxa3xx_nand_irq_thread():804 nand_writel(0x6, 0x0014)

Read data request received, I guess there is some ioread32_rep here
which is not traced and finally:

> [   32.970203] pxa3xx-nand f10d0000.flash: Wait time out!!!

Next lines are the error path.

> *[   32.975535] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():636 
> nand_readl(0x0014) = 0x0*
> *[   32.975540] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():637 
> nand_readl(0x0000) = 0x91078000*
> [   32.975546] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():639 
> nand_readl(0x0000) = 0x91078000
> [   32.975552] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():639 
> nand_readl(0x0000) = 0x91078000
> [   32.975559] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():639 
> nand_readl(0x0000) = 0x91078000
> [   32.975565] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():639 
> nand_readl(0x0000) = 0x91078000
> [   32.975572] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():645 
> nand_writel(0x81078000, 0x0000)
> [   32.975577] pxa3xx-nand f10d0000.flash: pxa3xx_nand_stop():651 
> nand_writel(0xfff, 0x0014)
> 
> I think I got one whole timeout sequence here :-)
> Register 0x0014 is NDSR and reg 0x0000 is NDCR, I have added a read
> of the NDSR register in the pxa3xx_nand_stop routine as highlighted
> above. It pussles me that the nand_start is called two times before
> the timeout, maybe it's okay.

Can you add traces there [1] to see which path is used ?

[1]
http://elixir.free-electrons.com/linux/latest/source/drivers/mtd/nand/pxa3xx_nand.c#L669


Thanks,
Miquèl

-- 
Miquel Raynal, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com



More information about the linux-mtd mailing list