pcm|dmaengine|imx-sdma race condition on i.MX6

Robin Gong yibin.gong at nxp.com
Fri Aug 14 04:45:17 EDT 2020


On 2020/08/13 19:23: Richard Leitner <richard.leitner at skidata.com> wrote:
> Hi,
> we've found a race condition with the PCM on the i.MX6 which results in an
> -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN).
> 
> A possible reproduction may look like the following reduced call graph during a
> PCM capture:
> 
> us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES)
>       - wait_for_avail()
>         - schedule_timeout()
>    -> snd_pcm_update_hw_ptr0()
>       - snd_pcm_update_state: EPIPE (XRUN)
>       - sdma_disable_channel_async() # get's scheduled away due to sleep us
> <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us ->
> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us ->
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames
>       - sdma_prep_dma_cyclic()
>         - sdma_load_context() # not loaded as context_loaded is 1
>       - wait_for_avail()
>         - schedule_timeout()
> # now the sdma_channel_terminate_work() comes back and sets #
> context_loaded = false and frees in vchan_dma_desc_free_list().
> us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?))
Seems the write error caused by context_loaded not set to false before
next transfer start? If yes, please have a try with the 03/04 of the below
patch set, anyway, could you post your failure log?
https://lkml.org/lkml/2020/8/11/111

> 
> 
> What we have found out, based on our understanding:
> The dmaengine docu states that a dmaengine_terminate_async() must be
> followed by a dmaengine_synchronize().
> However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is
> called (for performance reasons and because it might be called from an
> interrupt handler).
> 
> In our tests, we saw that the user-space immediately calls
> ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun
> (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In
> our case (imx-sdma.c), the terminate really happens asynchronously with a
> worker thread which is not awaited/synchronized by the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) call.
> 
> Since the syscall immediately enters an atomic context
> (snd_pcm_stream_lock_irq()), we are not able to flush the work of the
> termination worker from within the DMA context. This leads to an
> unterminated DMA getting re-initialized and then terminated.
> 
> On the i.MX6 platform the problem is (if I got it correctly) that the
> sdma_channel_terminate_work() called after the -EPIPE gets scheduled away
> (for the 1-2ms sleep [1]). During that time the userspace already sends in the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) and
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> As none of them are anyhow synchronized to the terminate_worker the
> vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are
> executed during the wait_for_avail() [4] of the
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> 
> To make sure we identified the problem correctly we've tested to add a
> "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the
> race condition in all our tests. (Before we were able to reproduce it in 100% of
> the test runs).
> 
> Based on our understanding, there are two different points to ensure the
> termination:
> Either ensure that the termination is finished within the previous
> SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing
> it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts)
> before entering the atomic context (from the PCM context).
> 
> We initially thought about implementing the first approach, basically splitting
> up the dma_device terminate_all operation into a sync
> (busy-wait) and a async one. This would align the operations with the
> DMAengine interface and would enable a sync termination variant from atomic
> contexts.
> However, we saw that the dma_free_attrs() function has a WARN_ON on irqs
> disabled, which would be the case for the sync variant. 
> Side note: We found this issue on the current v5.4.y LTS branch, but it also
> affects v5.8.y.
> 
> Any feedback or pointers how we may fix the problem are warmly welcome!
> If anything is unclear please just ask :-)
> 
> regards;
> Richard Leitner
> Benjamin Bara
> 
> [1]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1066&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=D9F%2FRUG27xv9nv8J1KtrLtld2eaI6gsXiWIAIgk
> Avjw%3D&reserved=0
> [2]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1071&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=0EKDVgzOZzL7TpX4ykhqjvpz5ryUHUpWw7frRe
> cksBU%3D&reserved=0
> [3]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1072&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=aIhatvb1ocQqyYCVFEg71LgJlRBoVusbDFPIxnte
> PuY%3D&reserved=0
> [4]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_lib.c%23L1
> 825&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7e
> 7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6373
> 29145824073919&sdata=y0Udbd%2FKGaVgqLrcp6fNOlMlFCGHCMfojkpp
> B4HzUuE%3D&reserved=0
> [5]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_native.c%2
> 3L3226&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f
> 7e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 7329145824073919&sdata=ch3BQ5DDGU5HWXqIZSvUeFnBoRoP%2BMM
> HEpnk8mIfWj8%3D&reserved=0



More information about the linux-arm-kernel mailing list