DMA Engine API performance issues

Wed Apr 14 20:08:24 EDT 2010

On Tue, Apr 13, 2010 at 11:42 PM, melwyn lobo <linux.melwyn at gmail.com> wrote:
> Hello Dan,
> I have some doubts regarding DMA API usage on its clients for example
> MMC, ALSA, USB etc.
>
> I am going to take example of the ALSA framework. Audio data transfer
> is initiated in soc_pcm_trigger(). This is called in an atomic
> context,
> with spinlock held and irqs disabled. Here most drivers enable data
> transfer from the MSP peripheral to the audio codec via tx_submit
> implementation
> of the DMA engine driver. This enqueues the transaction in an active
> list which calls for using spinlocks with bottom half disabled.
> It is in this function when spin_unlock_bh() is called the kernel
> detects irq's disabled and generates a warning.
> So the workaround here for ALSA drivers would be to use tasklet or
> workqueues to defer calling this in an interruptible context, which
> would
> cause performance problems (the same function soc_pcm_trigger is
> called for stoppping the transfer) in cases where the stream has to be
> repeatedly
> stopped and started.
>
> So the core issue is use of spin_unlock_bh in an atomic context.
> Workaround for removing the warnings would be:
> 1. Use spin_lock with irqsave and corresponding unlock function which
> does not generate a warning in a similar situation.
>  But this could be futile in one case where the tasklet is scheduled
> from ksoftirqd which could lead to corruption.
>  Also this means the interrupts would be disabled (on the local cpu)
> till the function executed which is not something
>  desirable.
> 2. Use local_irq_enable() before calling the DMA APIs and disable once
> done. This is a crude solution and understandably undesirable and
> dangerous.
>
> The DMA Engine framework assumes that the channel interrupt handling
> is done in a tasklet (dma_run_dependencies), which I believe is the
> reason for the issue.

dma_run_dependencies() is only needed in the channel switching case
which really only applies to the raid/mem-to-mem usage model (i.e. xor
on one channel followed by copy on another).  In the mem-to-io model
you should not need to perform channel switching.  I suggest following
what the other mem-to-io drivers (ipu, dw_dmac, coh...) have
implemented with their locks.

In general the dmaengine api is meant to provide 1/ a method for
matching dma consumers with capable dma devices 2/ a platform agnostic
api for issuing mem-to-mem and simple mem-to-io (slave) dma.  If the
current framework provides everything you need, then by all means use
it, but you may find there are architecture specific concerns that
cannot be supported under the existing mem-to-io model.  In other
words the dmaengine abstraction stops being useful and gets in the way
when there are specific architecture considerations beyond simple
channel to slave-device associations.  For example, dw_dmac and ipu
are successfully using the dma-slave interface while the PXA folks are
sticking with their local dma api.

So use it if it simplifies your development more than it complicates it.

--
Dan