[PATCH fix] mtd: gpmi: wake up the process at the end of the DMA callback

Brian Norris computersforpeace at gmail.com
Mon Nov 11 15:15:45 EST 2013


Hi Huang,

On Mon, Nov 11, 2013 at 12:13:45PM +0800, Huang Shijie wrote:
> [1] The gpmi uses the nand_command_lp to issue the commands to NAND chips.
>     The gpmi issues a DMA operation with gpmi_cmd_ctrl when it handles
>     a NAND_CMD_NONE control command. So when we read a page(NAND_CMD_READ0)
>     from the NAND, we may send two DMA operations back-to-back.
> 
>     If we do not serialize the two DMA operations, we will meet a bug when
> 
>     1.1) we enable CONFIG_DMA_API_DEBUG, CONFIG_DMADEVICES_DEBUG,
>          and CONFIG_DEBUG_SG.
> 
>     1.2) Use the following commands in an UART console and a SSH console:
>          cmd 1: while true;do dd if=/dev/mtd0 of=/dev/null;done
>          cmd 1: while true;do dd if=/dev/mmcblk0 of=/dev/null;done
> 
>     The kernel log shows below:
>     -----------------------------------------------------------------
>     kernel BUG at lib/scatterlist.c:28!
>     Unable to handle kernel NULL pointer dereference at virtual address 00000000
>       .........................
>     [<80044a0c>] (__bug+0x18/0x24) from [<80249b74>] (sg_next+0x48/0x4c)
>     [<80249b74>] (sg_next+0x48/0x4c) from [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4)
>     [<80255398>] (debug_dma_unmap_sg+0x170/0x1a4) from [<8004af58>] (dma_unmap_sg+0x14/0x6c)
>     [<8004af58>] (dma_unmap_sg+0x14/0x6c) from [<8027e594>] (mxs_dma_tasklet+0x18/0x1c)
>     [<8027e594>] (mxs_dma_tasklet+0x18/0x1c) from [<8007d444>] (tasklet_action+0x114/0x164)
>     -----------------------------------------------------------------
> 
>     1.3) Assume the two DMA operations is X (first) and Y (second).
> 
>          The root cause of the bug:
> 	   Assume process P issues DMA X, and sleep on the completion
> 	 @this->dma_done. X's tasklet callback is dma_irq_callback. It firstly
> 	 wake up the process sleeping on the completion @this->dma_done,
> 	 and then trid to unmap the scatterlist S. The waked process P will
> 	 issue Y in another ARM core. Y initializes S->sg_magic to zero
> 	 with sg_init_one(), while dma_irq_callback is unmapping S at the same
> 	 time.
> 
> 	 See the diagram:
> 
>                    ARM core 0              |         ARM core 1
> 	 -------------------------------------------------------------
>          (P issues DMA X, then sleep)  --> |
>                                            |
>          (X's tasklet wakes P)         --> |
>                                            |
>                                            | <-- (P begin to issue DMA Y)
>                                            |
>          (X's tasklet unmap the            |
>       scatterlist S with dma_unmap_sg) --> | <-- (Y calls sg_init_one() to init
>                                            |      scatterlist S)
>                                            |
> 
> [2] This patch serialize both the X and Y in the following way:
>      Unmap the DMA scatterlist S firstly, and wake up the process at the end
>      of the DMA callback, in such a way, Y will be executed after X.
> 
>      After this patch:
> 
>                    ARM core 0              |         ARM core 1
> 	 -------------------------------------------------------------
>          (P issues DMA X, then sleep)  --> |
>                                            |
>          (X's tasklet unmap the            |
>       scatterlist S with dma_unmap_sg) --> |
>                                            |
>          (X's tasklet wakes P)         --> |
>                                            |
>                                            | <-- (P begin to issue DMA Y)
>                                            |
>                                            | <-- (Y calls sg_init_one() to init
>                                            |     scatterlist S)
>                                            |
> 
> Cc: stable at vger.kernel.org
> Signed-off-by: Huang Shijie <b32955 at freescale.com>
> ---
>  fix the wrong diagram.

Thanks for improving this with (1) a much simpler patch and (2) a very
good diagram. I think it hits at the real heart of the problem without
adding extra concurrency primitives which obscure the code.

>  Resend this patch, CC to stable.

It looks like this is long-standing bug (since the driver was introduced
in v3.2), right? I'll mark it with v3.2+.

> ---
>  drivers/mtd/nand/gpmi-nand/gpmi-nand.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
> index 7ac2280..f89b4fd 100644
> --- a/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
> +++ b/drivers/mtd/nand/gpmi-nand/gpmi-nand.c
> @@ -394,8 +394,6 @@ static void dma_irq_callback(void *param)
>  	struct gpmi_nand_data *this = param;
>  	struct completion *dma_c = &this->dma_done;
>  
> -	complete(dma_c);
> -
>  	switch (this->dma_type) {
>  	case DMA_FOR_COMMAND:
>  		dma_unmap_sg(this->dev, &this->cmd_sgl, 1, DMA_TO_DEVICE);
> @@ -420,6 +418,8 @@ static void dma_irq_callback(void *param)
>  	default:
>  		pr_err("in wrong DMA operation.\n");
>  	}
> +
> +	complete(dma_c);
>  }
>  
>  int start_dma_without_bch_irq(struct gpmi_nand_data *this,

I fixed the subject to describe the problem better ("mtd: gpmi: fix
kernel BUG due to racing DMA operation") and pushed it to l2-mtd.git.
Thanks!

Brian



More information about the linux-mtd mailing list