[PATCH v2 2/2] mtd: rawnand: gpmi: Fix the random DMA timeout issue

Miquel Raynal miquel.raynal at bootlin.com
Thu Dec 10 16:39:06 EST 2020


On Wed, 2020-12-09 at 03:51:04 UTC, Han Xu wrote:
> To get better performance, current gpmi driver collected and chained all
> small DMA transfers in gpmi_nfc_exec_op, the whole chain triggered and
> wait for complete at the end.
> 
> But some random DMA timeout found in this new driver, with the help of
> ftrace, we found the root cause is as follows:
> 
> Take gpmi_ecc_read_page() as an example, gpmi_nfc_exec_op collected 6
> DMA transfers and the DMA chain triggered at the end. It waits for bch
> completion and check jiffies if it's timeout. The typical function graph
> shown below,
> 
>    63.216351 |   1)               |  gpmi_ecc_read_page() {
>    63.216352 |   1)   0.750 us    |    gpmi_bch_layout_std();
>    63.216354 |   1)               |    gpmi_nfc_exec_op() {
>    63.216355 |   1)               |      gpmi_chain_command() {
>    63.216356 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.216357 |   1)               |          /* mxs chan ccw idx: 0 */
>    63.216358 |   1)   1.750 us    |        }
>    63.216359 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.216360 |   1)               |          /* mxs chan ccw idx: 1 */
>    63.216361 |   1)   2.000 us    |        }
>    63.216361 |   1)   6.500 us    |      }
>    63.216362 |   1)               |      gpmi_chain_command() {
>    63.216363 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.216364 |   1)               |          /* mxs chan ccw idx: 2 */
>    63.216365 |   1)   1.750 us    |        }
>    63.216366 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.216367 |   1)               |          /* mxs chan ccw idx: 3 */
>    63.216367 |   1)   1.750 us    |        }
>    63.216368 |   1)   5.875 us    |      }
>    63.216369 |   1)               |      /* gpmi_chain_wait_ready */
>    63.216370 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.216372 |   1)               |        /* mxs chan ccw idx: 4 */
>    63.216373 |   1)   3.000 us    |      }
>    63.216374 |   1)               |      /* gpmi_chain_data_read */
>    63.216376 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.216377 |   1)               |        /* mxs chan ccw idx: 5 */
>    63.216378 |   1)   2.000 us    |      }
>    63.216379 |   1)   1.125 us    |      mxs_dma_tx_submit();
>    63.216381 |   1)   1.000 us    |      mxs_dma_enable_chan();
>    63.216712 |   0)   2.625 us    |  mxs_dma_int_handler();
>    63.216717 |   0)   4.250 us    |  bch_irq();
>    63.216723 |   0)   1.250 us    |  mxs_dma_tasklet();
>    63.216723 |   1)               |      /* jiffies left 250 */
>    63.216725 |   1) ! 372.000 us  |    }
>    63.216726 |   1)   2.625 us    |    gpmi_count_bitflips();
>    63.216730 |   1) ! 379.125 us  |  }
> 
> but it's not gurantee that bch irq handled always after dma irq handled,
> sometimes bch_irq comes first and gpmi_nfc_exec_op won't wait anymore,
> another gpmi_nfc_exec_op may get invoked before last DMA chain IRQ
> handled, this messed up the next DMA chain and causes DMA timeout. Check
> the trace log when issue happened.
> 
>    63.218923 |   1)               |  gpmi_ecc_read_page() {
>    63.218924 |   1)   0.625 us    |    gpmi_bch_layout_std();
>    63.218926 |   1)               |    gpmi_nfc_exec_op() {
>    63.218927 |   1)               |      gpmi_chain_command() {
>    63.218928 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.218929 |   1)               |          /* mxs chan ccw idx: 0 */
>    63.218929 |   1)   1.625 us    |        }
>    63.218931 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.218931 |   1)               |          /* mxs chan ccw idx: 1 */
>    63.218932 |   1)   1.750 us    |        }
>    63.218933 |   1)   5.875 us    |      }
>    63.218934 |   1)               |      gpmi_chain_command() {
>    63.218934 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.218935 |   1)               |          /* mxs chan ccw idx: 2 */
>    63.218936 |   1)   1.875 us    |        }
>    63.218937 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.218938 |   1)               |          /* mxs chan ccw idx: 3 */
>    63.218939 |   1)   1.625 us    |        }
>    63.218939 |   1)   5.875 us    |      }
>    63.218940 |   1)               |      /* gpmi_chain_wait_ready */
>    63.218941 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.218942 |   1)               |        /* mxs chan ccw idx: 4 */
>    63.218942 |   1)   1.625 us    |      }
>    63.218943 |   1)               |      /* gpmi_chain_data_read */
>    63.218944 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.218945 |   1)               |        /* mxs chan ccw idx: 5 */
>    63.218947 |   1)   2.375 us    |      }
>    63.218948 |   1)   0.625 us    |      mxs_dma_tx_submit();
>    63.218949 |   1)   1.000 us    |      mxs_dma_enable_chan();
>    63.219276 |   0)   5.125 us    |  bch_irq();                  <----
>    63.219283 |   1)               |      /* jiffies left 250 */
>    63.219285 |   1) ! 358.625 us  |    }
>    63.219286 |   1)   2.750 us    |    gpmi_count_bitflips();
>    63.219289 |   1) ! 366.000 us  |  }
>    63.219290 |   1)               |  gpmi_ecc_read_page() {
>    63.219291 |   1)   0.750 us    |    gpmi_bch_layout_std();
>    63.219293 |   1)               |    gpmi_nfc_exec_op() {
>    63.219294 |   1)               |      gpmi_chain_command() {
>    63.219295 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.219295 |   0)   1.875 us    |  mxs_dma_int_handler();      <----
>    63.219296 |   1)               |          /* mxs chan ccw idx: 6 */
>    63.219297 |   1)   2.250 us    |        }
>    63.219298 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.219298 |   0)   1.000 us    |  mxs_dma_tasklet();
>    63.219299 |   1)               |          /* mxs chan ccw idx: 0 */
>    63.219300 |   1)   1.625 us    |        }
>    63.219300 |   1)   6.375 us    |      }
>    63.219301 |   1)               |      gpmi_chain_command() {
>    63.219302 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.219303 |   1)               |          /* mxs chan ccw idx: 1 */
>    63.219304 |   1)   1.625 us    |        }
>    63.219305 |   1)               |        mxs_dma_prep_slave_sg() {
>    63.219306 |   1)               |          /* mxs chan ccw idx: 2 */
>    63.219306 |   1)   1.875 us    |        }
>    63.219307 |   1)   6.000 us    |      }
>    63.219308 |   1)               |      /* gpmi_chain_wait_ready */
>    63.219308 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.219309 |   1)               |        /* mxs chan ccw idx: 3 */
>    63.219310 |   1)   2.000 us    |      }
>    63.219311 |   1)               |      /* gpmi_chain_data_read */
>    63.219312 |   1)               |      mxs_dma_prep_slave_sg() {
>    63.219313 |   1)               |        /* mxs chan ccw idx: 4 */
>    63.219314 |   1)   1.750 us    |      }
>    63.219315 |   1)   0.625 us    |      mxs_dma_tx_submit();
>    63.219316 |   1)   0.875 us    |      mxs_dma_enable_chan();
>    64.224227 |   1)               |      /* jiffies left 0 */
> 
> In the first gpmi_nfc_exec_op, bch_irq comes first and gpmi_nfc_exec_op
> exits, but DMA IRQ still not happened yet until the middle of following
> gpmi_nfc_exec_op, the first DMA transfer index get messed and DMA get
> timeout.
> 
> To fix the issue, when there is bch ops in DMA chain, the
> gpmi_nfc_exec_op should wait for both completions rather than bch
> completion only.
> 
> Fixes: ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op")
> Signed-off-by: Han Xu <han.xu at nxp.com>
> Reviewed-by: Sascha Hauer <s.hauer at pengutronix.de>

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 1406495, thanks.

Miquel



More information about the linux-mtd mailing list