[RFC] Redesign drivers/dma/mmp_pdma
Robert Jarzmik
robert.jarzmik at free.fr
Fri Feb 20 09:19:33 PST 2015
Hi Daniel and Vinod,
This is what I have in mind for the next evolution of mmp_pdma. Aside from the
first parts of this document, I have questions in the second part I wish I had
comments for.
This is a pretty thick piece to injest, so take your time, I won't begin
submitting anything around mmp_pdma anymore until we have a good discussion
about it, at least with Daniel.
The root of this goes back to 2013 (references in [1]), where we had a
discussion between Daniel, Vinod and me, and where the conclusion was "DmaEngine
slave API permits pxa_camera to work as before", it's just I didn't take the
time to write down what was needed in the pxa dma driver to shift pxa_camera to
dmaengine, given my knowledge of pxa_camera dma handling.
Any comments are welcome, before I begin the actual work.
Cheers.
--
Robert
---8>---
>From 1c0b943a0f58fb7eb1aac9c58a61abe21cb13b95 Mon Sep 17 00:00:00 2001
From: Robert Jarzmik <robert.jarzmik at free.fr>
Date: Fri, 20 Feb 2015 17:50:32 +0100
Subject: [PATCH] Documentation: dmaengine: mmp_pdma design
Document the new design of the mmp_pdma dma driver.
Signed-off-by: Robert Jarzmik <robert.jarzmik at free.fr>
---
Documentation/dmaengine/mmp_pdma.txt | 147 +++++++++++++++++++++++++++++++++++
1 file changed, 147 insertions(+)
create mode 100644 Documentation/dmaengine/mmp_pdma.txt
diff --git a/Documentation/dmaengine/mmp_pdma.txt b/Documentation/dmaengine/mmp_pdma.txt
new file mode 100644
index 0000000..78ce486
--- /dev/null
+++ b/Documentation/dmaengine/mmp_pdma.txt
@@ -0,0 +1,147 @@
+PXA/MMP - DMA Slave controller
+==============================
+
+Constraints
+-----------
+ a) Transfers hot queuing
+ A driver submitting a transfer and issuing it should be granted the transfer
+ is queued even on a running DMA channel.
+ This implies that the queuing doesn't wait for the previous transfer end,
+ and that the descriptor chaining is not only done in the irq/tasklet code
+ triggered by the end of the transfer.
+
+ b) All transfers having asked for confirmation should be signaled
+ Any issued transfer with DMA_PREP_INTERRUPT should trigger a callback call.
+ This implies that even if an irq/tasklet is triggered by end of tx1, but
+ at the time of irq/dma tx2 is already finished, tx1->complete() and
+ tx2->complete() should be called.
+
+ c) Channel residue calculation
+ A channel should be able to report how much advanced is a transfer. The
+ granularity is still [TBD].
+
+ d) Channel running state
+ A driver should be able to query if a channel is running or not. For the
+ multimedia case, such as video capture, if a transfer is submitted and then
+ a check of the DMA channel reports a "stopped channel", the transfer should
+ not be issued until the next "start of frame interrupt", hence the need to
+ know if a channel is in running or stopped state.
+
+ e) Bandwidth guarantee
+ The PXA architecture has 3 levels of DMAs priorities : high, normal, low.
+ The high prorities get twice as much bandwith as the normal, which get twice
+ as much as the low priorities.
+ A driver should be able to request a priority, especially the real-time
+ ones such as pxa_camera with (big) throughputs.
+
+ f) Transfer reusability
+ An issued and finished transfer should be "reusable". The choice of
+ "DMA_CTRL_ACK" should be left to the client, not the dma driver.
+
+Design
+------
+ a) Virtual channels
+ Same concept as in sa11x0 driver, ie. a driver was assigned a "virtual
+ channel" linked to the requestor line, and the physical DMA channel is
+ assigned on the fly when the transfer is issued.
+
+ b) Transfer anatomy for a scatter-gather transfer
+ +------------+-----+---------------+----------------+-----------------+
+ | desc-sg[0] | ... | desc-sg[last] | status updater | finisher/linker |
+ +------------+-----+---------------+----------------+-----------------+
+
+ This structure is pointed by dma->sg_cpu.
+ The descriptors are used as follows :
+ - desc-sg[i]: i-th descriptor, transferring the i-th sg
+ element to the video buffer scatter gather
+ - status updater
+ Transfers a single u32 to a well known dma coherent memory to leave
+ a trace that this transfer is done. The "well known" is unique per
+ physical channel, meaning that a read of this value will tell which
+ is the last finished transfer at that point in time.
+ - finisher: has ddadr=DADDR_STOP, dcmd=ENDIRQEN
+ - linker: has ddadr= desc-sg[0] of next transfer, dcmd=0
+
+ b) Transfers hot-chaining
+ Suppose the running chain is :
+ Buffer 1 Buffer 2
+ +---------+----+---+ +----+----+----+---+
+ | d0 | .. | dN | l | | d0 | .. | dN | f |
+ +---------+----+-|-+ ^----+----+----+---+
+ | |
+ +----+
+
+ After a call to dma_async_issue_pending(), the chain will look like :
+ Buffer 1 Buffer 2 Buffer 3
+ +---------+----+---+ +----+----+----+---+ +----+----+----+---+
+ | d0 | .. | dN | l | | d0 | .. | dN | l | | d0 | .. | dN | f |
+ +---------+----+-|-+ ^----+----+----+-|-+ ^----+----+----+---+
+ | | | |
+ +----+ +----+
+ new_link
+
+ If while new_link was created the DMA channel stopped, it is reactivated
+ with DDADR = Buffer3.d0. To know if it stopped because Buffer3 was already
+ taken care of, see below "transfers completion updater".
+
+ c) Transfers completion updater
+ Each time a transfer is completed on a channel, an interrupt might be
+ generated or not, up to the client's request. But in each case, the last
+ descriptor of a transfer, the "status updater", will write the latest
+ transfer being completed into the physical channel's completion mark.
+
+ This will speed up residue calculation, for large transfers such as video
+ buffers which hold around 6k descriptors or more. This also allows without
+ any lock to find out what is the latest completed transfer in a running
+ DMA chain.
+
+ d) Transfers completion, irq and tasklet
+ When a transfer flagged as "DMA_PREP_INTERRUPT" is finished, the dma irq
+ is raised. Upon this interrupt, a tasklet is scheduled for the physical
+ channel.
+ The tasklet is responsible for :
+ - reading the physical channel last updater mark
+ - calling all the transfer callbacks of finished transfers, based on
+ that mark, and each transfer flags.
+ If a transfer is completed while this handling is done, a dma irq will
+ be raised, and the tasklet will be scheduled once again, having a new
+ updater mark.
+
+ e) Residue
+ Residue granularity will be descriptor based. The issued but not completed
+ transfers will be scanned for all of their descriptors against the
+ currently running descriptor.
+
+---
+This below will be dropped from the submission when the RFC will become a true patch.
+
+Remaining questions:
+-------------------
+ 1) Given the number of things to be changed, is it still worth patching
+ existing mmp_pdma or develop a new driver : the main changes will come
+ from virt-dma usage, and the irq/tasklet/hot chaining revamp ?
+ I'm expecting around 50% of code reuse, future will tell ...
+
+ 2) Cyclic transfers and residue calculation
+ For this iteration, I'll leave cyclic transfers as they are. But I was
+ wondering if it wouldn't make sense to allocate dma descriptors for
+ the _unique_ running cyclic transfer in a page, so that all the descriptors
+ are contiguous.
+ Of course, this would limit the span of a cyclic transfer to :
+ 4096 / (32 * 4) * 4096 = 2097152 bytes
+ On the other hand, the residue calculation changes from a chained list
+ traversal to a simple (DDADR - desc0) / (32 * 4) * 4096, assuming all
+ scatter gather elements have the same 4k length ...
+
+ Now the question is, will it be worth it ?
+
+ 3) DmaEngine and DMA_CTRL_ACK : this one is for Dan/Vinod
+ In Documentation/dmaengine/provider.txt, it is written that "DMA_CTRL_ACK":
+ "No one really has an idea of what it's about".
+ Actually that fits exactly one of my requirements : being able to reuse a
+ transfer without having to recalculate all the descriptors for it. The
+ rationale behind is that for video buffers, of 6k descriptors, the chain
+ is reused when the transfer is submitted/issued again, without having to
+ re-allocate and compose the whole descirptor links.
+
+ Is my understanding of what DMA_CTRL_ACK should mean correct ?
+
+ 4) DmaEngine providing channels with properties
+ I need that the provided channel matches bandwidth criteria. It's not
+ doable to assign the most bandwidth capable channel to the first
+ requestor, but rather have the requestor ask for a "minimum" level of
+ priority (between low/medium/high).
+
+ How is this supposed to be done ?
--
2.1.0
--
Robert
[1] References
http://lists.infradead.org/pipermail/linux-mtd/2013-August/047881.html
http://lists.infradead.org/pipermail/linux-mtd/2013-August/048086.html
More information about the linux-arm-kernel
mailing list