[PATCH v2 2/2] mtd: rawnand: gpmi: Fix the random DMA timeout issue
Miquel Raynal
miquel.raynal at bootlin.com
Thu Dec 10 16:39:06 EST 2020
On Wed, 2020-12-09 at 03:51:04 UTC, Han Xu wrote:
> To get better performance, current gpmi driver collected and chained all
> small DMA transfers in gpmi_nfc_exec_op, the whole chain triggered and
> wait for complete at the end.
>
> But some random DMA timeout found in this new driver, with the help of
> ftrace, we found the root cause is as follows:
>
> Take gpmi_ecc_read_page() as an example, gpmi_nfc_exec_op collected 6
> DMA transfers and the DMA chain triggered at the end. It waits for bch
> completion and check jiffies if it's timeout. The typical function graph
> shown below,
>
> 63.216351 | 1) | gpmi_ecc_read_page() {
> 63.216352 | 1) 0.750 us | gpmi_bch_layout_std();
> 63.216354 | 1) | gpmi_nfc_exec_op() {
> 63.216355 | 1) | gpmi_chain_command() {
> 63.216356 | 1) | mxs_dma_prep_slave_sg() {
> 63.216357 | 1) | /* mxs chan ccw idx: 0 */
> 63.216358 | 1) 1.750 us | }
> 63.216359 | 1) | mxs_dma_prep_slave_sg() {
> 63.216360 | 1) | /* mxs chan ccw idx: 1 */
> 63.216361 | 1) 2.000 us | }
> 63.216361 | 1) 6.500 us | }
> 63.216362 | 1) | gpmi_chain_command() {
> 63.216363 | 1) | mxs_dma_prep_slave_sg() {
> 63.216364 | 1) | /* mxs chan ccw idx: 2 */
> 63.216365 | 1) 1.750 us | }
> 63.216366 | 1) | mxs_dma_prep_slave_sg() {
> 63.216367 | 1) | /* mxs chan ccw idx: 3 */
> 63.216367 | 1) 1.750 us | }
> 63.216368 | 1) 5.875 us | }
> 63.216369 | 1) | /* gpmi_chain_wait_ready */
> 63.216370 | 1) | mxs_dma_prep_slave_sg() {
> 63.216372 | 1) | /* mxs chan ccw idx: 4 */
> 63.216373 | 1) 3.000 us | }
> 63.216374 | 1) | /* gpmi_chain_data_read */
> 63.216376 | 1) | mxs_dma_prep_slave_sg() {
> 63.216377 | 1) | /* mxs chan ccw idx: 5 */
> 63.216378 | 1) 2.000 us | }
> 63.216379 | 1) 1.125 us | mxs_dma_tx_submit();
> 63.216381 | 1) 1.000 us | mxs_dma_enable_chan();
> 63.216712 | 0) 2.625 us | mxs_dma_int_handler();
> 63.216717 | 0) 4.250 us | bch_irq();
> 63.216723 | 0) 1.250 us | mxs_dma_tasklet();
> 63.216723 | 1) | /* jiffies left 250 */
> 63.216725 | 1) ! 372.000 us | }
> 63.216726 | 1) 2.625 us | gpmi_count_bitflips();
> 63.216730 | 1) ! 379.125 us | }
>
> but it's not gurantee that bch irq handled always after dma irq handled,
> sometimes bch_irq comes first and gpmi_nfc_exec_op won't wait anymore,
> another gpmi_nfc_exec_op may get invoked before last DMA chain IRQ
> handled, this messed up the next DMA chain and causes DMA timeout. Check
> the trace log when issue happened.
>
> 63.218923 | 1) | gpmi_ecc_read_page() {
> 63.218924 | 1) 0.625 us | gpmi_bch_layout_std();
> 63.218926 | 1) | gpmi_nfc_exec_op() {
> 63.218927 | 1) | gpmi_chain_command() {
> 63.218928 | 1) | mxs_dma_prep_slave_sg() {
> 63.218929 | 1) | /* mxs chan ccw idx: 0 */
> 63.218929 | 1) 1.625 us | }
> 63.218931 | 1) | mxs_dma_prep_slave_sg() {
> 63.218931 | 1) | /* mxs chan ccw idx: 1 */
> 63.218932 | 1) 1.750 us | }
> 63.218933 | 1) 5.875 us | }
> 63.218934 | 1) | gpmi_chain_command() {
> 63.218934 | 1) | mxs_dma_prep_slave_sg() {
> 63.218935 | 1) | /* mxs chan ccw idx: 2 */
> 63.218936 | 1) 1.875 us | }
> 63.218937 | 1) | mxs_dma_prep_slave_sg() {
> 63.218938 | 1) | /* mxs chan ccw idx: 3 */
> 63.218939 | 1) 1.625 us | }
> 63.218939 | 1) 5.875 us | }
> 63.218940 | 1) | /* gpmi_chain_wait_ready */
> 63.218941 | 1) | mxs_dma_prep_slave_sg() {
> 63.218942 | 1) | /* mxs chan ccw idx: 4 */
> 63.218942 | 1) 1.625 us | }
> 63.218943 | 1) | /* gpmi_chain_data_read */
> 63.218944 | 1) | mxs_dma_prep_slave_sg() {
> 63.218945 | 1) | /* mxs chan ccw idx: 5 */
> 63.218947 | 1) 2.375 us | }
> 63.218948 | 1) 0.625 us | mxs_dma_tx_submit();
> 63.218949 | 1) 1.000 us | mxs_dma_enable_chan();
> 63.219276 | 0) 5.125 us | bch_irq(); <----
> 63.219283 | 1) | /* jiffies left 250 */
> 63.219285 | 1) ! 358.625 us | }
> 63.219286 | 1) 2.750 us | gpmi_count_bitflips();
> 63.219289 | 1) ! 366.000 us | }
> 63.219290 | 1) | gpmi_ecc_read_page() {
> 63.219291 | 1) 0.750 us | gpmi_bch_layout_std();
> 63.219293 | 1) | gpmi_nfc_exec_op() {
> 63.219294 | 1) | gpmi_chain_command() {
> 63.219295 | 1) | mxs_dma_prep_slave_sg() {
> 63.219295 | 0) 1.875 us | mxs_dma_int_handler(); <----
> 63.219296 | 1) | /* mxs chan ccw idx: 6 */
> 63.219297 | 1) 2.250 us | }
> 63.219298 | 1) | mxs_dma_prep_slave_sg() {
> 63.219298 | 0) 1.000 us | mxs_dma_tasklet();
> 63.219299 | 1) | /* mxs chan ccw idx: 0 */
> 63.219300 | 1) 1.625 us | }
> 63.219300 | 1) 6.375 us | }
> 63.219301 | 1) | gpmi_chain_command() {
> 63.219302 | 1) | mxs_dma_prep_slave_sg() {
> 63.219303 | 1) | /* mxs chan ccw idx: 1 */
> 63.219304 | 1) 1.625 us | }
> 63.219305 | 1) | mxs_dma_prep_slave_sg() {
> 63.219306 | 1) | /* mxs chan ccw idx: 2 */
> 63.219306 | 1) 1.875 us | }
> 63.219307 | 1) 6.000 us | }
> 63.219308 | 1) | /* gpmi_chain_wait_ready */
> 63.219308 | 1) | mxs_dma_prep_slave_sg() {
> 63.219309 | 1) | /* mxs chan ccw idx: 3 */
> 63.219310 | 1) 2.000 us | }
> 63.219311 | 1) | /* gpmi_chain_data_read */
> 63.219312 | 1) | mxs_dma_prep_slave_sg() {
> 63.219313 | 1) | /* mxs chan ccw idx: 4 */
> 63.219314 | 1) 1.750 us | }
> 63.219315 | 1) 0.625 us | mxs_dma_tx_submit();
> 63.219316 | 1) 0.875 us | mxs_dma_enable_chan();
> 64.224227 | 1) | /* jiffies left 0 */
>
> In the first gpmi_nfc_exec_op, bch_irq comes first and gpmi_nfc_exec_op
> exits, but DMA IRQ still not happened yet until the middle of following
> gpmi_nfc_exec_op, the first DMA transfer index get messed and DMA get
> timeout.
>
> To fix the issue, when there is bch ops in DMA chain, the
> gpmi_nfc_exec_op should wait for both completions rather than bch
> completion only.
>
> Fixes: ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op")
> Signed-off-by: Han Xu <han.xu at nxp.com>
> Reviewed-by: Sascha Hauer <s.hauer at pengutronix.de>
Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 1406495, thanks.
Miquel
More information about the linux-mtd
mailing list