pcm|dmaengine|imx-sdma race condition on i.MX6

Wed Aug 19 07:08:29 EDT 2020

On 8/17/20 9:28 AM, Benjamin Bara - SKIDATA wrote:
> We think this is not an i.MX6-specific problem, but a problem of the DMAengine usage from the PCM.
> In case of a XRUN, the DMA channel is never closed but first a SNDRV_PCM_TRIGGER_STOP next a
> SNDRV_PCM_TRIGGER_START is triggered.
> The SNDRV_PCM_TRIGGER_STOP simply executes a dmaengine_terminate_async() [1]
> but does not await the termination by calling dmaengine_synchronize(),
> which is required as stated by the docu [2].
> Anyways, we are not able to fix it in the pcm_dmaengine layer either at the end of
> SNDRV_PCM_TRIGGER_STOP (called from the DMA on complete interrupt handler)
> or at the beginning of SNDRV_PCM_TRIGGER_START (called from a PCM ioctl),
> since the dmaengine_synchronize() requires a non-atomic context.

I think this might be an sdma specific problem after all. 
dmaengine_terminate_async() will issue a request to stop the DMA. But it 
is still safe to issue the next transfer, even without calling 
dmaengine_synchronize(). The DMA should start the new transfer at its 
earliest convenience in that case.

dmaegine_synchronize() is so that the consumer has a guarantee that the 
DMA is finished using the resources (e.g. the memory buffers) associated 
with the DMA transfer so it can safely free them.

>
> Based on my understanding, most of the DMA implementations don't even implement device_synchronize
> and if they do, it might not really be necessary since the terminate_all operation is synchron.
There are a lot of buggy DMAengine drivers :) Pretty much all of them 
need device_synchronize() to get this right.
>
> With the i.MX6, it looks a bit different:
> Since [4], the terminate_all operation really schedules a worker which waits the required ~1ms and
> then does the context freeing.
> Now, the ioctl(SNDRV_PCM_IOCTL_PREPARE) and the following ioctl(SNDRV_PCM_IOCTL_READI_FRAMES),
> which are called from US to handle/recover from a XRUN, are in a race with the terminate_worker.
> If the terminate_worker finishes earlier, everything is fine.
> Otherwise, the sdma_prep_dma_cyclic() is called, sets up everything and
> as soon as it is scheduled out to wait for data, the terminate_worker is scheduled and kills it.
> In this case, we wait in [5] until the timeout is reached and return with -EIO.
>
> Based on our understanding, there exist two different fixing approaches:
> We thought that the pcm_dmaengine should handle this by either synchronizing the DMA on a trigger or
> terminating it synchronously.
> However, as we are in an atomic context, we either have to give up the atomic context of the PCM
> to finish the termination or we have to design a synchronous terminate variant which is callable
> from an atomic context.
>
> For the first option, which is potentially more performant, we have to leave the atomic PCM context
> and we are not sure if we are allowed to.
> For the second option, we would have to divide the dma_device terminate_all into an atomic sync and
> an async one, which would align with the dmaengine API, giving it the option to ensure termination
> in an atomic context.
> Based on my understanding, most of them are synchronous anyways, for the currently async ones we
> would have to implement busy waits.
> However, with this approach, we reach the WARN_ON [6] inside of an atomic context,
> indicating we might not do the right thing.

I don't know how feasible this is to implement in the SDMA dmaengine 
driver. But I think what is should do is to have some flag to indicate 
if a terminate is in progress. If a new transfer is issued while 
terminate is in progress the transfer should go on a list. Once 
terminate finishes it should check the list and start the transfer if 
there are any on the list.

- Lars