[PATCH] ARM: pl330: Fix a race condition
Javi Merino
javi.merino at arm.com
Tue Sep 20 09:36:24 EDT 2011
On 19/09/11 19:07, Jassi Brar wrote:
> On Mon, Sep 19, 2011 at 10:41 PM, Javi Merino <javi.merino at arm.com> wrote:
>> If two requests have been submitted and one of them is running, if you
>> call pl330_chan_ctrl(ch_id, PL330_OP_START), there's a window of time
>> between the spin_lock_irqsave() and the _state() check in which the
>> running transaction may finish. In that case, we don't receive the
>> interrupt (because they are disabled), but _start() sees that the DMA
>> is stopped, so it starts it. The problem is that it sends the
>> transaction that has just finished again, because pl330_update()
>> hasn't mark it as done yet.
>>
>> This patch moves the _state() check out of the critical section, which
>> removes the race condition. It also treats PL330_STATE_COMPLETING as
>> still executing, because that introduces another race condition now
>> that we call _state() with interrupts enabled. Namely, if we read the
>> state as "completing" and the DMA sends the interrupt before we
>> disable interrupts, pl330_update() starts the next transaction and
>> returns. Then the _start() in pl330_chan_ctrl() will patiently wait
>> until the just issued transaction finishes (because the state we read
>> was PL330_STATE_COMPLETING) and when it does, it _trigger()s the same
>> transaction again.
>>
>> Signed-off-by: Javi Merino <javi.merino at arm.com>
>> Cc: Jassi Brar <jassi.brar at samsung.com>
>> ---
>> arch/arm/common/pl330.c | 12 +++++++-----
>> 1 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/common/pl330.c b/arch/arm/common/pl330.c
>> index 97912fa..26b5615 100644
>> --- a/arch/arm/common/pl330.c
>> +++ b/arch/arm/common/pl330.c
>> @@ -936,9 +936,9 @@ static bool _trigger(struct pl330_thread *thrd)
>> return true;
>> }
>>
>> -static bool _start(struct pl330_thread *thrd)
>> +static bool _start(struct pl330_thread *thrd, u32 state)
>> {
>> - switch (_state(thrd)) {
>> + switch (state) {
>> case PL330_STATE_FAULT_COMPLETING:
>> UNTIL(thrd, PL330_STATE_FAULTING | PL330_STATE_KILLING);
>>
>> @@ -949,7 +949,6 @@ static bool _start(struct pl330_thread *thrd)
>> _stop(thrd);
>>
>> case PL330_STATE_KILLING:
>> - case PL330_STATE_COMPLETING:
>> UNTIL(thrd, PL330_STATE_STOPPED)
>>
>> case PL330_STATE_STOPPED:
>> @@ -961,6 +960,7 @@ static bool _start(struct pl330_thread *thrd)
>> case PL330_STATE_UPDTPC:
>> case PL330_STATE_CACHEMISS:
>> case PL330_STATE_EXECUTING:
>> + case PL330_STATE_COMPLETING:
>> return true;
>>
>> case PL330_STATE_WFE: /* For RESUME, nothing yet */
>> @@ -1471,7 +1471,7 @@ int pl330_update(const struct pl330_info *pi)
>> MARK_FREE(rqdone);
>>
>> /* Get going again ASAP */
>> - _start(thrd);
>> + _start(thrd, _state(thrd));
>>
>> /* For now, just make a list of callbacks to be done */
>> list_add_tail(&rqdone->rqd, &pl330->req_done);
>> @@ -1510,12 +1510,14 @@ int pl330_chan_ctrl(void *ch_id, enum pl330_chan_op op)
>> struct pl330_dmac *pl330;
>> unsigned long flags;
>> int ret = 0, active;
>> + u32 dma_state;
>>
>> if (!thrd || thrd->free || thrd->dmac->state == DYING)
>> return -EINVAL;
>>
>> pl330 = thrd->dmac;
>>
>> + dma_state = _state(thrd);
>> spin_lock_irqsave(&pl330->lock, flags);
>>
>> switch (op) {
>> @@ -1546,7 +1548,7 @@ int pl330_chan_ctrl(void *ch_id, enum pl330_chan_op op)
>>
>> /* Start the next */
>> case PL330_OP_START:
>> - if (!_start(thrd))
>> + if (!_start(thrd, dma_state))
>> ret = -EIO;
>> break;
>>
>> --
>
> IIUIC your race scenario should be taken care of simply by doing...
>
> diff --git a/arch/arm/common/pl330.c b/arch/arm/common/pl330.c
> index 97912fa..7129cfb 100644
> --- a/arch/arm/common/pl330.c
> +++ b/arch/arm/common/pl330.c
> @@ -1546,7 +1546,7 @@ int pl330_chan_ctrl(void *ch_id, enum pl330_chan_op op)
>
> /* Start the next */
> case PL330_OP_START:
> - if (!_start(thrd))
> + if (!_thrd_active(thrd) && !_start(thrd))
> ret = -EIO;
> break;
>
My first reaction was this was just moving the race condition, but
thinking about it carefully, it looks like a better (simpler) solution.
> Could you please test if it fixes the issue?
Yes, I'll kick off runs later today and see how it goes. I don't have a
way to reproduce the bug consistently, I just do lots of DMA
transactions and eventually, I end up hitting it. I'll run it overnight
and see if it fixes it or not.
Cheers,
Javi
More information about the linux-arm-kernel
mailing list