[PATCH v2] ARM: edma: fix residue race for cyclic

Fri Oct 16 05:28:50 PDT 2015

On 2015-10-16, Peter Ujfalusi <peter.ujfalusi at ti.com> wrote:
> On 10/16/2015 01:26 PM, John Ogness wrote:
>> When retrieving the residue value for cyclic transfers, the
>> SRC/DST fields of the active PaRAM are read. However, the AM335x
>> Technical Reference Manual states:
>> 
>>   11.3.3.6 Parameter Set Updates
>> 
>>   After the TR is read from the PaRAM (and is in the process
>>   of being submitted to the EDMA3TC), the following fields are
>>   updated as needed: ... SRC DST
>> 
>> This means SRC/DST is incremented even though the DMA transfer
>> may not have started yet or is in progress. Thus if the reader
>> of the residue accesses the DMA buffer too quickly, the CPU is
>> misinformed about the data that has been successfully processed.
>> 
>> The CCSTAT.ACTV register is a boolean that is set if any TR is
>> being processed by either the EMDA3CC or EDMA3TC. By polling
>> this register it is possible to ensure that the residue value
>> returned is valid for immediate processing. However, since the
>> DMA engine may be active, polling may never hit a moment where
>> no TR is being processed. To handle this, the SRC/DST is also
>> polled to see if it changes. And as a last resort, a max loop
>> count for the busy waiting exists to avoid an infinite loop.
>
> I'm not sure what this actually going to solve, except that we are
> going to wait for the next PaRAM parameter update.

We are waiting until the active transfers are complete. The main wait
condition is when ACTV is 0. When this happens, all transfers are
definately complete. In the normal case, this is the condition that
causes the loop exit.

> As you have already described the issue is that when you first submit
> the transfer, the first PaRAM will be submitted to TC and at this
> point the parameters will be updated to be prepared for the next
> update.  This means that after we initiate the DMA the SRC/DST will be
> updated to point to the next batch of data. Reading SRC/DST before the
> second parameter update will always give you this. But this is also
> true further in the line.  We never know exactly where the DMA is, we
> only know that the DMA is somewhere in between 0x1234 - (0x1234 + data
> until next parameter update). At the next parameter update you again
> can not be sure where the DMA is, since it is now somewhere between
> (0x1234 + parameter update number of data) - till the next update, and
> so on.
>
> In the cyclic case we use AB-sync mode, in this case the parameter update is
> going to happen after each period if I'm not mistaken.
>
> To achieve the same thing (waiting for the next parameter update to happen)
> you could just poll the PaRAM's CCNT in AB-sync to see if it changing and when
> it did you return the SRC/DST address since those will be close enough at the
> time.

Is there really a difference between polling CCNT and polling SRC/DST?
Notice that the function does _not_ return the polled SRC/DST. The extra
polling is only used as an additional loop exit condition in case we
missed ACTV being 0. This condition does occur once in while (during my
3Mbit UART tests, about once every 100000 transfers).

> But what happens if the period size if big and the position is asked
> just right after the parameter update? We can not know this, so we
> spin a bit here and give up and return whatever we had in SRC/DST.

I would hope that we never enter the "give up" condition. If a transfer
request was started, that means one burst of data is supposed to be
ready, which means the burst transfer should execute quickly. If
something unexpected happens (certain clocks suddenly turn off, etc),
then we fallback to the "give up" condition and return bad data to the
caller.

I was considering if dmaengine_tx_status() should somehow notify that it
is not possible to identify the residue (i.e. we gave up waiting for the
transfer to complete). But that really adds new semantics to the DMA
API, which is something much bigger.

An alternative would be to always return a "last known residue". But
this can really only be tracked during DMA completion interrupts and
successful edma_residue() calls. Both of which cannot guarentee that we
didn't miss a completed transfer. That is particularly a problem for the
UART driver, where a call to dmaengine_tx_status() is then followed by
PIO. That means dmaengine_tx_status() needs to return as much DMA data
as is really available.

> Not sure how to deal with this, but for sure needs more thought...

This patch closes a very real and reproducable race window in the eDMA
driver. Aside from adding busy waiting cycles it does not produce worse
results than before. But I am open to ideas.

John Ogness