[PATCH v8 2/3] CMDQ: Mediatek CMDQ driver
Matthias Brugger
matthias.bgg at gmail.com
Tue Jun 14 03:17:30 PDT 2016
On 14/06/16 09:44, Horng-Shyang Liao wrote:
> Hi Matthias,
>
> On Wed, 2016-06-08 at 17:35 +0200, Matthias Brugger wrote:
>>
>> On 08/06/16 14:25, Horng-Shyang Liao wrote:
>>> Hi Matthias,
>>>
>>> On Wed, 2016-06-08 at 12:45 +0200, Matthias Brugger wrote:
>>>>
>>>> On 08/06/16 07:40, Horng-Shyang Liao wrote:
>>>>> Hi Matthias,
>>>>>
>>>>> On Tue, 2016-06-07 at 18:59 +0200, Matthias Brugger wrote:
>>>>>>
>>>>>> On 03/06/16 15:11, Matthias Brugger wrote:
>>>>>>>
>>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + smp_mb(); /* modify jump before enable thread */
>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + cmdq_thread_writel(thread, task->pa_base +
>>>>>>>>>>>>>>>> task->command_size,
>>>>>>>>>>>>>>>> + CMDQ_THR_END_ADDR);
>>>>>>>>>>>>>>>> + cmdq_thread_resume(thread);
>>>>>>>>>>>>>>>> + }
>>>>>>>>>>>>>>>> + list_move_tail(&task->list_entry, &thread->task_busy_list);
>>>>>>>>>>>>>>>> + spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>>>>>>>>> +}
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
>>>>>>>>>>>>>>>> + struct cmdq_thread *thread, u32 irq_flag)
>>>>>>>>>>>>>>>> +{
>>>>>>>>>>>>>>>> + struct cmdq_task *task, *tmp, *curr_task = NULL;
>>>>>>>>>>>>>>>> + u32 curr_pa;
>>>>>>>>>>>>>>>> + struct cmdq_cb_data cmdq_cb_data;
>>>>>>>>>>>>>>>> + bool err;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + if (irq_flag & CMDQ_THR_IRQ_ERROR)
>>>>>>>>>>>>>>>> + err = true;
>>>>>>>>>>>>>>>> + else if (irq_flag & CMDQ_THR_IRQ_DONE)
>>>>>>>>>>>>>>>> + err = false;
>>>>>>>>>>>>>>>> + else
>>>>>>>>>>>>>>>> + return;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> + list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
>>>>>>>>>>>>>>>> + list_entry) {
>>>>>>>>>>>>>>>> + if (curr_pa >= task->pa_base &&
>>>>>>>>>>>>>>>> + curr_pa < (task->pa_base + task->command_size))
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What are you checking here? It seems as if you make some implcit
>>>>>>>>>>>>>>> assumptions about pa_base and the order of execution of
>>>>>>>>>>>>>>> commands in the
>>>>>>>>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any
>>>>>>>>>>>>>>> guarantees
>>>>>>>>>>>>>>> about dma_handle?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Check what is the current running task in this GCE thread.
>>>>>>>>>>>>>> 2. Yes.
>>>>>>>>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
>>>>>>>>>>>>> guarantee that the dma_handle address is steadily growing, when
>>>>>>>>>>>>> calling
>>>>>>>>>>>>> dma_alloc_coherent. And if I understand the code correctly, you
>>>>>>>>>>>>> use this
>>>>>>>>>>>>> assumption to decide if the task picked from task_busy_list is
>>>>>>>>>>>>> currently
>>>>>>>>>>>>> executing. So I think this mecanism is not working.
>>>>>>>>>>>>
>>>>>>>>>>>> I don't use dma_handle address, and just use physical addresses.
>>>>>>>>>>>> From CPU's point of view, tasks are linked by the busy list.
>>>>>>>>>>>> From GCE's point of view, tasks are linked by the JUMP command.
>>>>>>>>>>>>
>>>>>>>>>>>>> In which cases does the HW thread raise an interrupt.
>>>>>>>>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
>>>>>>>>>>>>
>>>>>>>>>>>> GCE will raise interrupt if any task is done or error.
>>>>>>>>>>>> However, GCE is fast, so CPU may get multiple done tasks
>>>>>>>>>>>> when it is running ISR.
>>>>>>>>>>>>
>>>>>>>>>>>> In case of error, that GCE thread will pause and raise interrupt.
>>>>>>>>>>>> So, CPU may get multiple done tasks and one error task.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think we should reimplement the ISR mechanism. Can't we just read
>>>>>>>>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
>>>>>>>>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
>>>>>>>>>>> information from the handler to thread_fn, but that shouldn't be an
>>>>>>>>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
>>>>>>>>>>> there as short as possible. Traversing task_busy_list is expensive, so
>>>>>>>>>>> we need to do it in a thread context.
>>>>>>>>>>
>>>>>>>>>> Actually, our initial implementation is similar to your suggestion,
>>>>>>>>>> but display needs CMDQ to return callback function very precisely,
>>>>>>>>>> else display will drop frame.
>>>>>>>>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>>>>>>>>>> and CMDQ needs to call callback function in ISR.
>>>>>>>>>> If we defer callback to workqueue, the time interval may be larger than
>>>>>>>>>> 32 ms.sometimes.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think the problem is, that you implemented the workqueue as a ordered
>>>>>>>>> workqueue, so there is no parallel processing. I'm still not sure why
>>>>>>>>> you need the workqueue to be ordered. Can you please explain.
>>>>>>>>
>>>>>>>> The order should be kept.
>>>>>>>> Let me use mouse cursor as an example.
>>>>>>>> If task 1 means move mouse cursor to point A, task 2 means point B,
>>>>>>>> and task 3 means point C, our expected result is A -> B -> C.
>>>>>>>> If the order is not kept, the result could become A -> C -> B.
>>>>>>>>
>>>>>>>
>>>>>>> Got it, thanks for the clarification.
>>>>>>>
>>>>>>
>>>>>> I think a way to get rid of the workqueue is to use a timer, which gets
>>>>>> programmed to the time a timeout in the first task in the busy list
>>>>>> would happen. Everytime we update the busy list (e.g. because of task
>>>>>> got finished by the thread), we update the timer. When the timer
>>>>>> triggers, which hopefully won't happen too often, we return timeout on
>>>>>> the busy list elements, until the time is lower then the actual time.
>>>>>>
>>>>>> At least with this we can reduce the data structures in this driver and
>>>>>> make it more lightweight.
>>>>>
>>>>> From my understanding, your proposed method can handle timeout case.
>>>>>
>>>>> However, the workqueue is also in charge of releasing tasks.
>>>>> Do you take releasing tasks into consideration by using the proposed
>>>>> timer method?
>>>>> Furthermore, I think the code will become more complex if we also use
>>>>> timer to implement releasing tasks.
>>>>>
>>>>
>>>> Can't we call
>>>> clk_disable_unprepare(cmdq->clock);
>>>> cmdq_task_release(task);
>>>> after invoking the callback?
>>>
>>> Do you mean just call these two functions in ISR?
>>> My major concern is dma_free_coherent() and kfree() in
>>> cmdq_task_release(task).
>>
>> Why do we need the dma calls at all? Can't we just calculate the
>> physical address using __pa(x)?
>
> I prefer to use dma_map_single/dma_unmap_single.
>
Can you please elaborate why you need this. We don't do dma, so we
should not use dma memory for this.
>>> Therefore, your suggestion is to use GFP_ATOMIC for both
>>> dma_alloc_coherent() and kzalloc(). Right?
>>
>> I don't think we need GFP_ATOMIC, the critical path will just free the
>> memory.
>
> I tested these two functions, and kfree was safe.
> However, dma_free_coherent raised BUG.
> BUG: failure at
> /mnt/host/source/src/third_party/kernel/v3.18/mm/vmalloc.c:1514/vunmap()!
Just a general hint. Please try to evaluate on a recent kernel. It looks
like as if you tried this on a v3.18 based one.
Best regards,
Matthias
> 1512 void vunmap(const void *addr)
> 1513 {
> 1514 BUG_ON(in_interrupt()); // <-- here
> 1515 might_sleep();
> 1516 if (addr)
> 1517 __vunmap(addr, 0);
> 1518 }
> 1519 EXPORT_SYMBOL(vunmap);
>
> Therefore, I plan to use kmalloc + dma_map_single instead of
> dma_alloc_coherent, and dma_unmap_single + kfree instead of
> dma_free_coherent.
>
> What do you think about the function replacement?
>
>>> If so, I can try to implement timeout by timer, and discuss with you
>>> if I have further questions.
>>>
>>
>> Sounds good :)
>>
>> Thanks,
>> Matthias
>
> Thanks,
> HS
>
>>>> Regrading the clock, wouldn't it be easier to handle the clock
>>>> enable/disable depending on the state of task_busy_list? I suppose we
>>>> can't as we would need to check the task_busy_list of all threads, right?
>>>>
>>>> Regards,
>>>> Matthias
>>>
>>> Thanks,
>>> HS
>>>
>
>
More information about the linux-arm-kernel
mailing list