[patch 0/6] dma: edma: Provide granular residue accounting
Thomas Gleixner
tglx at linutronix.de
Thu Apr 17 13:31:44 PDT 2014
On Thu, 17 Apr 2014, Russell King - ARM Linux wrote:
> On Thu, Apr 17, 2014 at 02:40:43PM -0000, Thomas Gleixner wrote:
> > The next obstacle was the missing per SG element reporting. We really
> > can't wait for a full SG list for notification.
>
> Err, dmaengine doesn't have per-SG element reporting.
enum dma_residue_granularity {
DMA_RESIDUE_GRANULARITY_DESCRIPTOR = 0,
DMA_RESIDUE_GRANULARITY_SEGMENT = 1,
DMA_RESIDUE_GRANULARITY_BURST = 2,
};
tells a different story.
> What it does allow is several transactions to be submitted consecutively,
> so that the DMA engine can move to the next transaction once the previous
> one has been submitted.
>
> Where it's important that this happens with the minimum of delay, there's
> nothing in the API that prevents the hardware scatterlist of the previous
> transaction being linked directly to the following transaction, provided
> of course the hardware can do that.
Right. I hoped that this would be the case, as you would expect from
DMA, but as you observed correctly:
> Many DMA engine implementations are just lazy - they implement stuff as:
> setup hardware, run scatter list, get to the end, raise interrupt. Fire
> off tasklet. Tasklet runs, calls the callback, checks to see if there's
> another transaction, sets up hardware for the next one. That (as you
> would expect) gives quite a high latency to the following transaction.
Yep. It's just unusable for low latency applications.
> I've coded at least one DMA engine driver to start the next transaction
> immediately that the previous one completes, before the tasklet is run.
> As I say above, there's really no reason to even wait for the interrupt...
> if people can be bothered to think about all the implications that brings
> (f.e. reporting completion status, and how many bytes remaining of a
> transaction, etc.)
The EDMA HW would allow that as well, but the driver is definitely not
up to it and to be honest I didnt have the cycles to rewrite it from
scratch as that would be the only way to make that work.
> If it's just that the FIFO is spread over 4 consecutive locations
> (effectively due to not decoding bits 2,3 of the address bus for the
> register) then reading the first register four times is just as
> acceptable as reading them consecutively.
It's not a FIFO. It's four different consecutive registers, which are
DMA readable. And you need to read all of them...
> The reason that kind of thing was done in old days was to allow the
> ARM ldmia/stmia instructions to be used to access FIFOs, thereby
> allowing multiple words to be transferred with a single instruction.
> I can't believe that there's still people designing for that
> especially if they have a DMA engine...
In that case it's a magic DMA extension superglued beside the already
horrible register interface of that particular IP block.
Thanks,
tglx
More information about the linux-arm-kernel
mailing list