[PATCH v8 2/3] CMDQ: Mediatek CMDQ driver

Fri Jun 3 06:11:01 PDT 2016

On 03/06/16 14:13, Horng-Shyang Liao wrote:
> Hi Matthias,
>
> On Fri, 2016-06-03 at 13:18 +0200, Matthias Brugger wrote:
>>
>> On 03/06/16 08:12, Horng-Shyang Liao wrote:
>>> Hi Mathias,
>>>
>>> Please see my inline reply.
>>>
>>> On Thu, 2016-06-02 at 10:46 +0200, Matthias Brugger wrote:
>>>>
>>>> On 01/06/16 11:57, Horng-Shyang Liao wrote:
>>>>> Hi Mathias,
>>>>>
>>>>> Please see my inline reply.
>>>>>
>>>>> On Tue, 2016-05-31 at 22:04 +0200, Matthias Brugger wrote:
>>>>>>
>>>>>> On 31/05/16 10:36, Horng-Shyang Liao wrote:
>>>>>>> Hi Mathias,
>>>>>>>
>>>>>>> Please see my inline reply.
>>>>>>>
>>>>>>> On Mon, 2016-05-30 at 17:31 +0200, Matthias Brugger wrote:
>>>>>>>>
>>>>>>>> On 30/05/16 05:19, HS Liao wrote:
>>>>>>>>> This patch is first version of Mediatek Command Queue(CMDQ) driver. The
>>>>>>>>> CMDQ is used to help read/write registers with critical time limitation,
>>>>>>>>> such as updating display configuration during the vblank. It controls
>>>>>>>>> Global Command Engine (GCE) hardware to achieve this requirement.
>>>>>>>>> Currently, CMDQ only supports display related hardwares, but we expect
>>>>>>>>> it can be extended to other hardwares for future requirements.
>>>>>>>>>
>>>>>>>>> Signed-off-by: HS Liao <hs.liao at mediatek.com>
>>>>>>>>> Signed-off-by: CK Hu <ck.hu at mediatek.com>
>>>>>>>>> ---
>>>>>>>>>       drivers/soc/mediatek/Kconfig    |  10 +
>>>>>>>>>       drivers/soc/mediatek/Makefile   |   1 +
>>>>>>>>>       drivers/soc/mediatek/mtk-cmdq.c | 943 ++++++++++++++++++++++++++++++++++++++++
>>>>>>>>>       include/soc/mediatek/cmdq.h     | 197 +++++++++
>>>>>>>>>       4 files changed, 1151 insertions(+)
>>>>>>>>>       create mode 100644 drivers/soc/mediatek/mtk-cmdq.c
>>>>>>>>>       create mode 100644 include/soc/mediatek/cmdq.h
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/soc/mediatek/Kconfig b/drivers/soc/mediatek/Kconfig
>>>>>>>>> index 0a4ea80..c4ad75c 100644
>>>>>>>>> --- a/drivers/soc/mediatek/Kconfig
>>>>>>>>> +++ b/drivers/soc/mediatek/Kconfig
>>>>>>>>> @@ -1,6 +1,16 @@
>>>>>>>>>       #
>>>>>>>>>       # MediaTek SoC drivers
>>>>>>>>>       #
>>>>>>>>> +config MTK_CMDQ
>>>>>>>>> +	bool "MediaTek CMDQ Support"
>>>>>>>>> +	depends on ARCH_MEDIATEK || COMPILE_TEST
>>>>>>
>>>>>> depends on ARM64 ?
>>>>>
>>>>> Will add ARM64.
>>>>>
>>>>>>>>> +	select MTK_INFRACFG
>>>>>>>>> +	help
>>>>>>>>> +	  Say yes here to add support for the MediaTek Command Queue (CMDQ)
>>>>>>>>> +	  driver. The CMDQ is used to help read/write registers with critical
>>>>>>>>> +	  time limitation, such as updating display configuration during the
>>>>>>>>> +	  vblank.
>>>>>>>>> +
>>>>>>>>>       config MTK_INFRACFG
>>>>>>>>>       	bool "MediaTek INFRACFG Support"
>>>>>>>>>       	depends on ARCH_MEDIATEK || COMPILE_TEST
>>>>>>>>> diff --git a/drivers/soc/mediatek/Makefile b/drivers/soc/mediatek/Makefile
>>>>>>>>> index 12998b0..f7397ef 100644
>>>>>>>>> --- a/drivers/soc/mediatek/Makefile
>>>>>>>>> +++ b/drivers/soc/mediatek/Makefile
>>>>>>>>> @@ -1,3 +1,4 @@
>>>>>>>>> +obj-$(CONFIG_MTK_CMDQ) += mtk-cmdq.o
>>>>>>>>>       obj-$(CONFIG_MTK_INFRACFG) += mtk-infracfg.o
>>>>>>>>>       obj-$(CONFIG_MTK_PMIC_WRAP) += mtk-pmic-wrap.o
>>>>>>>>>       obj-$(CONFIG_MTK_SCPSYS) += mtk-scpsys.o
>>>>>>>>> diff --git a/drivers/soc/mediatek/mtk-cmdq.c b/drivers/soc/mediatek/mtk-cmdq.c
>>>>>>>>> new file mode 100644
>>>>>>>>> index 0000000..e9d6e1c
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/drivers/soc/mediatek/mtk-cmdq.c
>>>>>>>>> @@ -0,0 +1,943 @@
>>>>>>>>> +/*
>>>>>>>>> + * Copyright (c) 2015 MediaTek Inc.
>>>>>>>>> + *
>>>>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>>>>> + * it under the terms of the GNU General Public License version 2 as
>>>>>>>>> + * published by the Free Software Foundation.
>>>>>>>>> + *
>>>>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>>>>> + * GNU General Public License for more details.
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +#include <linux/clk.h>
>>>>>>>>> +#include <linux/clk-provider.h>
>>>>>>>>> +#include <linux/completion.h>
>>>>>>>>> +#include <linux/dma-mapping.h>
>>>>>>>>> +#include <linux/errno.h>
>>>>>>>>> +#include <linux/interrupt.h>
>>>>>>>>> +#include <linux/iopoll.h>
>>>>>>>>> +#include <linux/kernel.h>
>>>>>>>>> +#include <linux/kthread.h>
>>>>>>>>> +#include <linux/module.h>
>>>>>>>>> +#include <linux/mutex.h>
>>>>>>>>> +#include <linux/of_address.h>
>>>>>>>>> +#include <linux/of_irq.h>
>>>>>>>>> +#include <linux/platform_device.h>
>>>>>>>>> +#include <linux/slab.h>
>>>>>>>>> +#include <linux/spinlock.h>
>>>>>>>>> +#include <linux/suspend.h>
>>>>>>>>> +#include <linux/workqueue.h>
>>>>>>>>> +#include <soc/mediatek/cmdq.h>
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_INITIAL_CMD_BLOCK_SIZE	PAGE_SIZE
>>>>>>>>> +#define CMDQ_INST_SIZE			8 /* instruction is 64-bit */
>>>>>>>>> +#define CMDQ_TIMEOUT_MS			1000
>>>>>>>>> +#define CMDQ_IRQ_MASK			0xffff
>>>>>>>>> +#define CMDQ_DRIVER_DEVICE_NAME		"mtk_cmdq"
>>>>>>>>> +#define CMDQ_CLK_NAME			"gce"
>>>>>>>>
>>>>>>>> We can put the names in directly to un-bloat the defines.
>>>>>>>
>>>>>>> I will use the names directly and remove defines.
>>>>>>>
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_CURR_IRQ_STATUS		0x010
>>>>>>>>> +#define CMDQ_CURR_LOADED_THR		0x018
>>>>>>>>> +#define CMDQ_THR_SLOT_CYCLES		0x030
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_THR_BASE			0x100
>>>>>>>>> +#define CMDQ_THR_SHIFT			0x080
>>>>>>>>
>>>>>>>> Wouldn't be CMDQ_THR_SIZE more accurate?
>>>>>>>
>>>>>>> Will rename it.
>>>>>>>
>>>>>>>>> +#define CMDQ_THR_WARM_RESET		0x00
>>>>>>>>> +#define CMDQ_THR_ENABLE_TASK		0x04
>>>>>>>>> +#define CMDQ_THR_SUSPEND_TASK		0x08
>>>>>>>>> +#define CMDQ_THR_CURR_STATUS		0x0c
>>>>>>>>> +#define CMDQ_THR_IRQ_STATUS		0x10
>>>>>>>>> +#define CMDQ_THR_IRQ_ENABLE		0x14
>>>>>>>>> +#define CMDQ_THR_CURR_ADDR		0x20
>>>>>>>>> +#define CMDQ_THR_END_ADDR		0x24
>>>>>>>>> +#define CMDQ_THR_CFG			0x40
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_THR_ENABLED		0x1
>>>>>>>>> +#define CMDQ_THR_DISABLED		0x0
>>>>>>>>> +#define CMDQ_THR_SUSPEND		0x1
>>>>>>>>> +#define CMDQ_THR_RESUME			0x0
>>>>>>>>> +#define CMDQ_THR_STATUS_SUSPENDED	BIT(1)
>>>>>>>>> +#define CMDQ_THR_DO_WARM_RESET		BIT(0)
>>>>>>>>> +#define CMDQ_THR_ACTIVE_SLOT_CYCLES	0x3200
>>>>>>>>> +#define CMDQ_THR_PRIORITY		3
>>>>>>>>> +#define CMDQ_THR_IRQ_DONE		0x1
>>>>>>>>> +#define CMDQ_THR_IRQ_ERROR		0x12
>>>>>>>>> +#define CMDQ_THR_IRQ_EN			0x13 /* done + error */
>>>>>>>>
>>>>>>>> #define CMDQ_THR_IRQ_EN (CMDQ_THR_IRQ_ERROR | CMDQ_THR_IRQ_DONE)
>>>>>>>
>>>>>>> Will do.
>>>>>>>
>>>>>>>>> +#define CMDQ_THR_IRQ_MASK		0x13
>>>>>>>>
>>>>>>>> never used.
>>>>>>>
>>>>>>> Will remove.
>>>>>>>
>>>>>>>>> +#define CMDQ_THR_EXECUTING		BIT(31)
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_ARG_A_WRITE_MASK		0xffff
>>>>>>>>> +#define CMDQ_SUBSYS_MASK		0x1f
>>>>>>>>> +#define CMDQ_OP_CODE_MASK		0xff000000
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_OP_CODE_SHIFT		24
>>>>>>>>
>>>>>>>> Couldn't we connect the mask with the shift, or aren't they related?
>>>>>>>>
>>>>>>>> #define CMDQ_OP_CODE_MASK (0xff << CMDQ_OP_CODE_SHIFT)
>>>>>>>
>>>>>>> Will do.
>>>>>>>
>>>>>>>>> +#define CMDQ_SUBSYS_SHIFT		16
>>>>>>>>> +
>>>>>>>>> +#define CMDQ_WRITE_ENABLE_MASK		BIT(0)
>>>>>>>>> +#define CMDQ_JUMP_BY_OFFSET		0x10000000
>>>>>>>>> +#define CMDQ_JUMP_BY_PA			0x10000001
>>>>>>>>> +#define CMDQ_JUMP_PASS			CMDQ_INST_SIZE
>>>>>>>>> +#define CMDQ_WFE_UPDATE			BIT(31)
>>>>>>>>> +#define CMDQ_WFE_WAIT			BIT(15)
>>>>>>>>> +#define CMDQ_WFE_WAIT_VALUE		0x1
>>>>>>>>> +#define CMDQ_EOC_IRQ_EN			BIT(0)
>>>>>>>>> +
>>>>>>>>> +enum cmdq_thread_index {
>>>>>>>>> +	CMDQ_THR_DISP_MAIN_IDX,	/* main */
>>>>>>>>> +	CMDQ_THR_DISP_SUB_IDX,	/* sub */
>>>>>>>>> +	CMDQ_THR_DISP_MISC_IDX,	/* misc */
>>>>>>>>> +	CMDQ_THR_MAX_COUNT,	/* max */
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/*
>>>>>>>>> + * CMDQ_CODE_MOVE:
>>>>>>>>> + *   move value into internal register as mask
>>>>>>>>> + *   format: op mask
>>>>>>>>> + * CMDQ_CODE_WRITE:
>>>>>>>>> + *   write value into target register
>>>>>>>>> + *   format: op subsys address value
>>>>>>>>> + * CMDQ_CODE_JUMP:
>>>>>>>>> + *   jump by offset
>>>>>>>>> + *   format: op offset
>>>>>>>>> + * CMDQ_CODE_WFE:
>>>>>>>>> + *   wait for event and clear
>>>>>>>>> + *   it is just clear if no wait
>>>>>>>>> + *   format: [wait]  op event update:1 to_wait:1 wait:1
>>>>>>>>> + *           [clear] op event update:1 to_wait:0 wait:0
>>>>>>>>> + * CMDQ_CODE_EOC:
>>>>>>>>> + *   end of command
>>>>>>>>> + *   format: op irq_flag
>>>>>>>>> + */
>>>>>>>>
>>>>>>>> I think we need more documentation of how this command queue engine is
>>>>>>>> working. If not, I think it will be really complicated to understand how
>>>>>>>> to use this.
>>>>>>>>
>>>>>>>>> +enum cmdq_code {
>>>>>>>>> +	CMDQ_CODE_MOVE = 0x02,
>>>>>>>>> +	CMDQ_CODE_WRITE = 0x04,
>>>>>>>>> +	CMDQ_CODE_JUMP = 0x10,
>>>>>>>>> +	CMDQ_CODE_WFE = 0x20,
>>>>>>>>> +	CMDQ_CODE_EOC = 0x40,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +enum cmdq_task_state {
>>>>>>>>> +	TASK_STATE_BUSY, /* running on a GCE thread */
>>>>>>>>> +	TASK_STATE_ERROR,
>>>>>>>>> +	TASK_STATE_DONE,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq_task_cb {
>>>>>>>>> +	cmdq_async_flush_cb	cb;
>>>>>>>>> +	void			*data;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq_thread {
>>>>>>>>> +	void __iomem		*base;
>>>>>>>>> +	struct list_head	task_busy_list;
>>>>>>>>> +	wait_queue_head_t	wait_task_done;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq_task {
>>>>>>>>> +	struct cmdq		*cmdq;
>>>>>>>>> +	struct list_head	list_entry;
>>>>>>>>> +	enum cmdq_task_state	task_state;
>>>>>>>>> +	void			*va_base;
>>>>>>>>> +	dma_addr_t		pa_base;
>>>>>>>>> +	u64			engine_flag;
>>>>>>>>> +	size_t			command_size;
>>>>>>>>> +	u32			num_cmd;
>>>>>>>>
>>>>>>>> num_cmd is directly connected to command_size. I prefer to just keep
>>>>>>>> num_cmd and calculate the command_size where necessary.
>>>>>>>
>>>>>>> After I trace code, I prefer to keep command_size and calculate num_cmd
>>>>>>> where necessary. What do you think?
>>>>>>>
>>>>>>
>>>>>> I suppose you prefer this, as you are writing to the GCE depending on
>>>>>> the command_size. I think it is worth to create a macro for the
>>>>>> calculation of the number of commands, to make the code more readable.
>>>>>> Would be nice if you would just pass cmdq_task to it and it would return
>>>>>> the number. Just as an idea.
>>>>>
>>>>> Will add macro.
>>>>>
>>>>>>>>> +	struct cmdq_thread	*thread;
>>>>>>>>> +	struct cmdq_task_cb	cb;
>>>>>>>>> +	struct work_struct	release_work;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq {
>>>>>>>>> +	struct device		*dev;
>>>>>>>>> +	void __iomem		*base;
>>>>>>>>> +	u32			irq;
>>>>>>>>> +	struct workqueue_struct	*task_release_wq;
>>>>>>>>> +	struct cmdq_thread	thread[CMDQ_THR_MAX_COUNT];
>>>>>>>>> +	struct mutex		task_mutex;	/* for task */
>>>>>>>>> +	spinlock_t		exec_lock;	/* for exec */
>>>>>>>>> +	struct clk		*clock;
>>>>>>>>> +	bool			suspended;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq_subsys {
>>>>>>>>> +	u32	base;
>>>>>>>>> +	int	id;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static const struct cmdq_subsys gce_subsys[] = {
>>>>>>>>> +	{0x1400, 1},
>>>>>>>>> +	{0x1401, 2},
>>>>>>>>> +	{0x1402, 3},
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static int cmdq_subsys_base_to_id(u32 base)
>>>>>>>>> +{
>>>>>>>>> +	int i;
>>>>>>>>> +
>>>>>>>>> +	for (i = 0; i < ARRAY_SIZE(gce_subsys); i++)
>>>>>>>>> +		if (gce_subsys[i].base == base)
>>>>>>>>> +			return gce_subsys[i].id;
>>>>>>>>> +	return -EFAULT;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_eng_get_thread(u64 flag)
>>>>>>>>> +{
>>>>>>>>> +	if (flag & BIT_ULL(CMDQ_ENG_DISP_DSI0))
>>>>>>>>> +		return CMDQ_THR_DISP_MAIN_IDX;
>>>>>>>>> +	else if (flag & BIT_ULL(CMDQ_ENG_DISP_DPI0))
>>>>>>>>> +		return CMDQ_THR_DISP_SUB_IDX;
>>>>>>>>> +	else
>>>>>>>>> +		return CMDQ_THR_DISP_MISC_IDX;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_task_release(struct cmdq_task *task)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = task->cmdq;
>>>>>>>>> +
>>>>>>>>> +	dma_free_coherent(cmdq->dev, task->command_size, task->va_base,
>>>>>>>>> +			  task->pa_base);
>>>>>>>>> +	kfree(task);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static struct cmdq_task *cmdq_task_acquire(struct cmdq_rec *rec,
>>>>>>>>> +					   struct cmdq_task_cb cb)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = rec->cmdq;
>>>>>>>>> +	struct device *dev = cmdq->dev;
>>>>>>>>> +	struct cmdq_task *task;
>>>>>>>>> +
>>>>>>>>> +	task = kzalloc(sizeof(*task), GFP_KERNEL);
>>>>>>>>> +	INIT_LIST_HEAD(&task->list_entry);
>>>>>>>>> +	task->va_base = dma_alloc_coherent(dev, rec->command_size,
>>>>>>>>> +					   &task->pa_base, GFP_KERNEL);
>>>>>>>>> +	if (!task->va_base) {
>>>>>>>>> +		dev_err(dev, "allocate command buffer failed\n");
>>>>>>>>> +		kfree(task);
>>>>>>>>> +		return NULL;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	task->cmdq = cmdq;
>>>>>>>>> +	task->command_size = rec->command_size;
>>>>>>>>> +	task->engine_flag = rec->engine_flag;
>>>>>>>>> +	task->task_state = TASK_STATE_BUSY;
>>>>>>>>> +	task->cb = cb;
>>>>>>>>> +	memcpy(task->va_base, rec->buf, rec->command_size);
>>>>>>>>> +	task->num_cmd = task->command_size / CMDQ_INST_SIZE;
>>>>>>>>> +	return task;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_thread_writel(struct cmdq_thread *thread, u32 value,
>>>>>>>>> +			       u32 offset)
>>>>>>>>> +{
>>>>>>>>> +	writel(value, thread->base + offset);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static u32 cmdq_thread_readl(struct cmdq_thread *thread, u32 offset)
>>>>>>>>> +{
>>>>>>>>> +	return readl(thread->base + offset);
>>>>>>>>> +}
>>>>>>>>
>>>>>>>> We can get rid of cmdq_thread_readl/writel.
>>>>>>>
>>>>>>> Will do.
>>>>>>>
>>>>>>>>> +
>>>>>>>>> +static int cmdq_thread_suspend(struct cmdq *cmdq, struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	u32 status;
>>>>>>>>> +
>>>>>>>>> +	cmdq_thread_writel(thread, CMDQ_THR_SUSPEND, CMDQ_THR_SUSPEND_TASK);
>>>>>>>>> +
>>>>>>>>> +	/* If already disabled, treat as suspended successful. */
>>>>>>>>> +	if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) &
>>>>>>>>> +	      CMDQ_THR_ENABLED))
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	if (readl_poll_timeout_atomic(thread->base + CMDQ_THR_CURR_STATUS,
>>>>>>>>> +			status, status & CMDQ_THR_STATUS_SUSPENDED, 0, 10)) {
>>>>>>>>> +		dev_err(cmdq->dev, "suspend GCE thread 0x%x failed\n",
>>>>>>>>> +			(u32)(thread->base - cmdq->base));
>>>>>>>>> +		return -EFAULT;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_thread_resume(struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	cmdq_thread_writel(thread, CMDQ_THR_RESUME, CMDQ_THR_SUSPEND_TASK);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_thread_reset(struct cmdq *cmdq, struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	u32 warm_reset;
>>>>>>>>> +
>>>>>>>>> +	cmdq_thread_writel(thread, CMDQ_THR_DO_WARM_RESET, CMDQ_THR_WARM_RESET);
>>>>>>>>> +	if (readl_poll_timeout_atomic(thread->base + CMDQ_THR_WARM_RESET,
>>>>>>>>> +			warm_reset, !(warm_reset & CMDQ_THR_DO_WARM_RESET),
>>>>>>>>> +			0, 10)) {
>>>>>>>>> +		dev_err(cmdq->dev, "reset GCE thread 0x%x failed\n",
>>>>>>>>> +			(u32)(thread->base - cmdq->base));
>>>>>>>>> +		return -EFAULT;
>>>>>>>>> +	}
>>>>>>>>> +	writel(CMDQ_THR_ACTIVE_SLOT_CYCLES, cmdq->base + CMDQ_THR_SLOT_CYCLES);
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_thread_disable(struct cmdq *cmdq, struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	cmdq_thread_reset(cmdq, thread);
>>>>>>>>> +	cmdq_thread_writel(thread, CMDQ_THR_DISABLED, CMDQ_THR_ENABLE_TASK);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* notify GCE to re-fetch commands by setting GCE thread PC */
>>>>>>>>> +static void cmdq_thread_invalidate_fetched_data(struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	cmdq_thread_writel(thread,
>>>>>>>>> +			   cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR),
>>>>>>>>> +			   CMDQ_THR_CURR_ADDR);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_task_insert_into_thread(struct cmdq_task *task)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_thread *thread = task->thread;
>>>>>>>>> +	struct cmdq_task *prev_task = list_last_entry(
>>>>>>>>> +			&thread->task_busy_list, typeof(*task), list_entry);
>>>>>>>>> +	u64 *prev_task_base = prev_task->va_base;
>>>>>>>>> +
>>>>>>>>> +	/* let previous task jump to this task */
>>>>>>>>> +	prev_task_base[prev_task->num_cmd - 1] = (u64)CMDQ_JUMP_BY_PA << 32 |
>>>>>>>>> +						 task->pa_base;
>>>>>>>>> +
>>>>>>>>> +	cmdq_thread_invalidate_fetched_data(thread);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* we assume tasks in the same display GCE thread are waiting the same event. */
>>>>>>>>> +static void cmdq_task_remove_wfe(struct cmdq_task *task)
>>>>>>>>> +{
>>>>>>>>> +	u32 wfe_option = CMDQ_WFE_UPDATE | CMDQ_WFE_WAIT | CMDQ_WFE_WAIT_VALUE;
>>>>>>>>> +	u32 wfe_op = CMDQ_CODE_WFE << CMDQ_OP_CODE_SHIFT;
>>>>>>>>> +	u32 *base = task->va_base;
>>>>>>>>> +	u32 num_cmd = task->num_cmd << 1;
>>>>>>>>> +	int i;
>>>>>>>>> +
>>>>>>>>> +	for (i = 0; i < num_cmd; i += 2)
>>>>>>>>> +		if (base[i] == wfe_option &&
>>>>>>>>> +		    (base[i + 1] & CMDQ_OP_CODE_MASK) == wfe_op) {
>>>>>>>>> +			base[i] = CMDQ_JUMP_PASS;
>>>>>>>>> +			base[i + 1] = CMDQ_JUMP_BY_OFFSET;
>>>>>>>>> +		}
>>>>>>>>
>>>>>>>> After using the command buffer as a void pointer a u64 pointer, we now
>>>>>>>> reference to it as u32. I would prefer to explain here, how the command
>>>>>>>> looks like we are searching for and use a for loop passing task->num_cmd
>>>>>>>> instead.
>>>>>>>
>>>>>>> Will use u64* to rewrite the above code.
>>>>>>>
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_task_exec(struct cmdq_task *task, struct cmdq_thread *thread)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = task->cmdq;
>>>>>>>>> +	unsigned long flags;
>>>>>>>>> +	unsigned long curr_pa, end_pa;
>>>>>>>>> +
>>>>>>>>> +	WARN_ON(clk_prepare_enable(cmdq->clock) < 0);
>>>>>>>>> +	spin_lock_irqsave(&cmdq->exec_lock, flags);
>>>>>>>>
>>>>>>>> cmdq_task_exec is called with cmdq->task_mutex held, so why do we need
>>>>>>>> the spin_lock here? Can't we just use one of the two?
>>>>>>>
>>>>>>> We can drop task_mutex, but we will get some side effects.
>>>>>>> 1. exec_lock needs to include more code, but I think it is not good for
>>>>>>>        spinlock.
>>>>>>> 2. In cmdq_rec_flush_async(), task_mutex needs to protect
>>>>>>>        (1) cmdq->suspended, (2) cmdq_task_exec(), and
>>>>>>>        (3) cmdq_task_wait_release_schedule().
>>>>>>>        If we drop task_mutex, we have to put cmdq->suspended if condition
>>>>>>>        just before cmdq_task_exec() and inside exec_lock, and we have to
>>>>>>>        release task and its command buffer if error. This will let flow
>>>>>>>        become more complex and enlarge code size.
>>>>>>>
>>>>>>> What do you think?
>>>>>>
>>>>>> Why do you need to protect cmdq_task_wait_release_schedule? We don't
>>>>>> care about the order of the workqueue elements, do we?
>>>>>
>>>>> According to CK's comment, we have to ensure order to remove previous
>>>>> task.
>>>>> http://lists.infradead.org/pipermail/linux-mediatek/2016-May/005547.html
>>>>>
>>>>>> As far as I understand you would need to protect cmdq_task_acquire as
>>>>>> well, to "ensure" continously growing pa_base. More on that below.
>>>>>
>>>>> We need to ensure continuous physical addresses in a task, but we don't
>>>>> need to ensure continuous physical addresses between tasks.
>>>>> So, I think it is unnecessary to protect by mutex or spinlock.
>>>>>
>>>>
>>>> True, I didn't get that.
>>>>
>>>>>>>
>>>>>>>>> +	task->thread = thread;
>>>>>>>>> +	task->task_state = TASK_STATE_BUSY;
>>>>>>>>
>>>>>>>> That was already set in cmdq_task_acquire, why do we need to set it here
>>>>>>>> again?
>>>>>>>
>>>>>>> Will drop it.
>>>>>>>
>>>>>>
>>>>>> Yeah, but I think it makes more sense to drop it in cmdq_task_acquire
>>>>>> instead.
>>>>>
>>>>> Will drop it in cmdq_task_acquire instead.
>>>>>
>>>>>>>>> +	if (list_empty(&thread->task_busy_list)) {
>>>>>>>>> +		WARN_ON(cmdq_thread_reset(cmdq, thread) < 0);
>>>>>>>>> +
>>>>>>>>> +		cmdq_thread_writel(thread, task->pa_base, CMDQ_THR_CURR_ADDR);
>>>>>>>>> +		cmdq_thread_writel(thread, task->pa_base + task->command_size,
>>>>>>>>> +				   CMDQ_THR_END_ADDR);
>>>>>>>>> +		cmdq_thread_writel(thread, CMDQ_THR_PRIORITY, CMDQ_THR_CFG);
>>>>>>>>> +		cmdq_thread_writel(thread, CMDQ_THR_IRQ_EN,
>>>>>>>>> +				   CMDQ_THR_IRQ_ENABLE);
>>>>>>>>> +
>>>>>>>>> +		cmdq_thread_writel(thread, CMDQ_THR_ENABLED,
>>>>>>>>> +				   CMDQ_THR_ENABLE_TASK);
>>>>>>>>> +	} else {
>>>>>>>>> +		WARN_ON(cmdq_thread_suspend(cmdq, thread) < 0);
>>>>>>>>> +
>>>>>>>>> +		/*
>>>>>>>>> +		 * check boundary condition
>>>>>>>>> +		 * PC = END - 8, EOC is executed
>>>>>>>>> +		 * PC = END, all CMDs are executed
>>>>>>>>> +		 */
>>>>>>>>> +		curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
>>>>>>>>> +		end_pa = cmdq_thread_readl(thread, CMDQ_THR_END_ADDR);
>>>>>>>>> +		if (curr_pa == end_pa - 8 || curr_pa == end_pa) {
>>>>>>>>
>>>>>>>> 8 refers to CMDQ_INST_SIZE, right?
>>>>>>>
>>>>>>> Yes, I will use CMDQ_INST_SIZE.
>>>>>>>
>>>>>>>>> +			/* set to this task directly */
>>>>>>>>> +			cmdq_thread_writel(thread, task->pa_base,
>>>>>>>>> +					   CMDQ_THR_CURR_ADDR);
>>>>>>>>> +		} else {
>>>>>>>>> +			cmdq_task_insert_into_thread(task);
>>>>>>>>> +
>>>>>>>>> +			if (thread == &cmdq->thread[CMDQ_THR_DISP_MAIN_IDX] ||
>>>>>>>>> +			    thread == &cmdq->thread[CMDQ_THR_DISP_SUB_IDX])
>>>>>>>>> +				cmdq_task_remove_wfe(task);
>>>>>>>>
>>>>>>>> We could do this check using the task->engine_flag, I suppose that's
>>>>>>>> easier to undestand then.
>>>>>>>
>>>>>>> Will use task->engine_flag.
>>>>>>>
>>>>>>>>> +
>>>>>>>>> +			smp_mb(); /* modify jump before enable thread */
>>>>>>>>> +		}
>>>>>>>>> +
>>>>>>>>> +		cmdq_thread_writel(thread, task->pa_base + task->command_size,
>>>>>>>>> +				   CMDQ_THR_END_ADDR);
>>>>>>>>> +		cmdq_thread_resume(thread);
>>>>>>>>> +	}
>>>>>>>>> +	list_move_tail(&task->list_entry, &thread->task_busy_list);
>>>>>>>>> +	spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_handle_error_done(struct cmdq *cmdq,
>>>>>>>>> +				   struct cmdq_thread *thread, u32 irq_flag)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_task *task, *tmp, *curr_task = NULL;
>>>>>>>>> +	u32 curr_pa;
>>>>>>>>> +	struct cmdq_cb_data cmdq_cb_data;
>>>>>>>>> +	bool err;
>>>>>>>>> +
>>>>>>>>> +	if (irq_flag & CMDQ_THR_IRQ_ERROR)
>>>>>>>>> +		err = true;
>>>>>>>>> +	else if (irq_flag & CMDQ_THR_IRQ_DONE)
>>>>>>>>> +		err = false;
>>>>>>>>> +	else
>>>>>>>>> +		return;
>>>>>>>>> +
>>>>>>>>> +	curr_pa = cmdq_thread_readl(thread, CMDQ_THR_CURR_ADDR);
>>>>>>>>> +
>>>>>>>>> +	list_for_each_entry_safe(task, tmp, &thread->task_busy_list,
>>>>>>>>> +				 list_entry) {
>>>>>>>>> +		if (curr_pa >= task->pa_base &&
>>>>>>>>> +		    curr_pa < (task->pa_base + task->command_size))
>>>>>>>>
>>>>>>>> What are you checking here? It seems as if you make some implcit
>>>>>>>> assumptions about pa_base and the order of execution of commands in the
>>>>>>>> thread. Is it save to do so? Does dma_alloc_coherent give any guarantees
>>>>>>>> about dma_handle?
>>>>>>>
>>>>>>> 1. Check what is the current running task in this GCE thread.
>>>>>>> 2. Yes.
>>>>>>> 3. Yes, CMDQ doesn't use iommu, so physical address is continuous.
>>>>>>>
>>>>>>
>>>>>> Yes, physical addresses might be continous, but AFAIK there is no
>>>>>> guarantee that the dma_handle address is steadily growing, when calling
>>>>>> dma_alloc_coherent. And if I understand the code correctly, you use this
>>>>>> assumption to decide if the task picked from task_busy_list is currently
>>>>>> executing. So I think this mecanism is not working.
>>>>>
>>>>> I don't use dma_handle address, and just use physical addresses.
>>>>>    From CPU's point of view, tasks are linked by the busy list.
>>>>>    From GCE's point of view, tasks are linked by the JUMP command.
>>>>>
>>>>>> In which cases does the HW thread raise an interrupt.
>>>>>> In case of error. When does CMDQ_THR_IRQ_DONE get raised?
>>>>>
>>>>> GCE will raise interrupt if any task is done or error.
>>>>> However, GCE is fast, so CPU may get multiple done tasks
>>>>> when it is running ISR.
>>>>>
>>>>> In case of error, that GCE thread will pause and raise interrupt.
>>>>> So, CPU may get multiple done tasks and one error task.
>>>>>
>>>>
>>>> I think we should reimplement the ISR mechanism. Can't we just read
>>>> CURR_IRQ_STATUS and THR_IRQ_STATUS in the handler and leave
>>>> cmdq_handle_error_done to the thread_fn? You will need to pass
>>>> information from the handler to thread_fn, but that shouldn't be an
>>>> issue. AFAIK interrupts are disabled in the handler, so we should stay
>>>> there as short as possible. Traversing task_busy_list is expensive, so
>>>> we need to do it in a thread context.
>>>
>>> Actually, our initial implementation is similar to your suggestion,
>>> but display needs CMDQ to return callback function very precisely,
>>> else display will drop frame.
>>> For display, CMDQ interrupt will be raised every 16 ~ 17 ms,
>>> and CMDQ needs to call callback function in ISR.
>>> If we defer callback to workqueue, the time interval may be larger than
>>> 32 ms.sometimes.
>>>
>>
>> I think the problem is, that you implemented the workqueue as a ordered
>> workqueue, so there is no parallel processing. I'm still not sure why
>> you need the workqueue to be ordered. Can you please explain.
>
> The order should be kept.
> Let me use mouse cursor as an example.
> If task 1 means move mouse cursor to point A, task 2 means point B,
> and task 3 means point C, our expected result is A -> B -> C.
> If the order is not kept, the result could become A -> C -> B.
>

Got it, thanks for the clarification.

Matthias

>>>> I keep thinking about how to get rid of the two data structures,
>>>> task_busy_list and the task_release_wq. We need the latter for the only
>>>> sake of getting a timeout.
>>>>
>>>> Did you have a look on how the mailbox framework handles this?
>>>> By the way, what is the reason to not implement the whole driver as a
>>>> mailbox controller? For me, this driver looks like a good fit.
>>>
>>> CMDQ needs to encode commands for GCE hardware. We think this behavior
>>> should be put in CMDQ driver, and client just call CMDQ functions.
>>> Therefore, if we want to use mailbox framework, cmdq_rec must be
>>> mailbox client, and the others must be mailbox controller.
>>>
>>
>> You mean the functions to fill the cmdq_rec and execute it?
>> I think this should be part of the driver.
>
> Yes.
>
>> Jassi, can you have a look on the interface this driver exports [0].
>> They are needed to actually create the message which will be send.
>> Could something like this be part of a mailbox driver?
>>
>> [0] https://patchwork.kernel.org/patch/9140221/
>>
>>> However, if we use mailbox controller, CMDQ driver still needs to
>>> control busy list for each GCE thread, and use workqueue to handle
>>> timeout tasks.
>>>
>>
>> Let me summarize my ideas around this driver:
>> When we enter the ISR, we know that all task in task_busy_list before
>> the entry which represents curr_task can be set to TASK_STATE_DONE.
>> The curr_task could be TASK_STATE_ERROR if the corresponding bit in the
>> irq status registers is set.
>> Do we need to call the callback in the same order as the tasks got
>> dispatched to the HW thread? If not, we could try to call all this
>> callbacks in a multithreaded workqueue.
>
> Yes, we should keep order.
>
>> Regards,
>> Matthias
>
> Thanks,
> HS
>
>>> The only thing that we can borrow from mailbox framework is the send
>>> (CMDQ flush) and receive (CMDQ callback) interface, However, we don't
>>> think we can gain many benefits from it, and we have some overheads to
>>> conform to mailbox interface.
>>>
>>>
>>>>
>>>>>>>>> +			curr_task = task;
>>>>>>>>> +		if (task->cb.cb) {
>>>>>>>>> +			cmdq_cb_data.err = curr_task ? err : false;
>>>>>>>>> +			cmdq_cb_data.data = task->cb.data;
>>>>>>>>> +			task->cb.cb(cmdq_cb_data);
>>>>>>>>> +		}
>>>>>>>>> +		task->task_state = (curr_task && err) ? TASK_STATE_ERROR :
>>>>>>>>> +				TASK_STATE_DONE;
>>>>>>>>> +		list_del(&task->list_entry);
>>>>>>>>> +		if (curr_task)
>>>>>>>>> +			break;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	wake_up(&thread->wait_task_done);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_thread_irq_handler(struct cmdq *cmdq, int tid)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_thread *thread = &cmdq->thread[tid];
>>>>>>>>> +	unsigned long flags = 0L;
>>>>>>>>> +	u32 irq_flag;
>>>>>>>>> +
>>>>>>>>> +	spin_lock_irqsave(&cmdq->exec_lock, flags);
>>>>>>>>> +
>>>>>>>>> +	irq_flag = cmdq_thread_readl(thread, CMDQ_THR_IRQ_STATUS);
>>>>>>>>> +	cmdq_thread_writel(thread, ~irq_flag, CMDQ_THR_IRQ_STATUS);
>>>>>>>>> +
>>>>>>>>> +	/*
>>>>>>>>> +	 * Another CPU core could run "release task" right before we acquire
>>>>>>>>> +	 * the spin lock, and thus reset / disable this GCE thread, so we
>>>>>>>>> +	 * need to check the enable bit of this GCE thread.
>>>>>>>>> +	 */
>>>>>>>>> +	if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) &
>>>>>>>>> +	      CMDQ_THR_ENABLED))
>>>>>>>>> +		irq_flag = 0;
>>>>>>>>
>>>>>>>> cmdq_handle_error_done just retuns in this case. Programming this way
>>>>>>>> just makes things confusing. What about:
>>>>>>>>
>>>>>>>> if (cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) &
>>>>>>>> 	      CMDQ_THR_ENABLED)
>>>>>>>> 	cmdq_handle_error_done(cmdq, thread, irq_flag);
>>>>>>>> else
>>>>>>>> 		irq_flag = 0;
>>>>>>>>
>>>>>>>> spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>
>>>>>>> We still need to clear irq_flag if GCE thread is disabled.
>>>>>>> So, I think we can just return here.
>>>>>>>
>>>>>>> 	if (!(cmdq_thread_readl(thread, CMDQ_THR_ENABLE_TASK) &
>>>>>>> 	      CMDQ_THR_ENABLED))
>>>>>>> 		return;
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>
>>>>>> No, you can't just return, you need to unlock the spinlock.
>>>>>> Anyway I would prefer it the other way round, as I put it in my last
>>>>>> mail. Just delete the else branch, we don't need to set irq_flag to zero.
>>>>>
>>>>> Sorry for my previous wrong reply since I merge your comment
>>>>> and CK's comment.
>>>>> http://lists.infradead.org/pipermail/linux-mediatek/2016-May/005547.html
>>>>> So, I will put this if condition into cmdq_task_handle_error_result()
>>>>> and then just return it if GCE thread is disabled.
>>>>>
>>>>
>>>> You mean in cmdq_task_handle_done
>>>> We should rename this functions to not create confusion.
>>>
>>> Sorry again. I mean in cmdq_handle_error_done().
>>> This function handles both done and error.
>>>
>>> I agree the function name looks confusion.
>>> I think it can be renamed to cmdq_thread_irq_handler()
>>> since it is used to handle irq for GCE thread.
>>>
>>>> Regards,
>>>> Matthias
>>>
>>> Thanks,
>>> HS
>>>
>>>>>>>>> +
>>>>>>>>> +	cmdq_handle_error_done(cmdq, thread, irq_flag);
>>>>>>>>> +	spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static irqreturn_t cmdq_irq_handler(int irq, void *dev)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = dev;
>>>>>>>>> +	u32 irq_status;
>>>>>>>>> +	int i;
>>>>>>>>> +
>>>>>>>>> +	irq_status = readl(cmdq->base + CMDQ_CURR_IRQ_STATUS);
>>>>>>>>> +	irq_status &= CMDQ_IRQ_MASK;
>>>>>>>>> +	irq_status ^= CMDQ_IRQ_MASK;
>>>>>>>>
>>>>>>>> irq_status can be much bigger then 3, which is the number of threads in
>>>>>>>> the system (CMDQ_THR_MAX_COUNT). So why we use this mask here isn't
>>>>>>>> clear to me.
>>>>>>>
>>>>>>> Our GCE hardware has 16 threads, but we only use 3 threads currently.
>>>>>>>
>>>>>>
>>>>>> Ok, but please use bitops here.
>>>>>
>>>>> Will use bitops.
>>>>>
>>>>>>>>> +
>>>>>>>>> +	if (!irq_status)
>>>>>>>>> +		return IRQ_NONE;
>>>>>>>>> +
>>>>>>>>> +	while (irq_status) {
>>>>>>>>> +		i = ffs(irq_status) - 1;
>>>>>>>>> +		irq_status &= ~BIT(i);
>>>>>>>>> +		cmdq_thread_irq_handler(cmdq, i);
>>>>>>>>> +	}
>>>>>>>>
>>>>>>>> Can you explain how the irq status register looks like, that would it
>>>>>>>> make much easier to understand what happens here.
>>>>>>>
>>>>>>> Bit 0 ~ 15 of irq status register represents GCE thread 0 ~ 15
>>>>>>> interrupt. 0 means asserting interrupt; 1 means no interrupt.
>>>>>>>
>>>>>>
>>>>>> Thanks, that helped. :)
>>>>>>
>>>>>>>>> +
>>>>>>>>> +	return IRQ_HANDLED;	
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_task_handle_error_result(struct cmdq_task *task)
>>>>>>>>
>>>>>>>> We never check the return values, why do we have them?
>>>>>>>
>>>>>>> Will drop return value.
>>>>>>>
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = task->cmdq;
>>>>>>>>> +	struct device *dev = cmdq->dev;
>>>>>>>>> +	struct cmdq_thread *thread = task->thread;
>>>>>>>>> +	struct cmdq_task *next_task, *prev_task;
>>>>>>>>> +	u32 irq_flag;
>>>>>>>>> +
>>>>>>>>> +	/* suspend GCE thread to ensure consistency */
>>>>>>>>> +	WARN_ON(cmdq_thread_suspend(cmdq, thread) < 0);
>>>>>>>>> +
>>>>>>>>> +	/* ISR has handled this error task */
>>>>>>>>> +	if (task->task_state == TASK_STATE_ERROR) {
>>>>>>>>> +		next_task = list_first_entry_or_null(&thread->task_busy_list,
>>>>>>>>> +				struct cmdq_task, list_entry);
>>>>>>>>> +		if (next_task) /* move to next task */
>>>>>>>>> +			cmdq_thread_writel(thread, next_task->pa_base,
>>>>>>>>> +					   CMDQ_THR_CURR_ADDR);
>>>>>>>>
>>>>>>>> We have to do this, as we suppose that the thread did not reach the jump
>>>>>>>> instruction we put into it's command queue, right?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>
>>>>>> So this should then go into it's own function. In wait_release_work,
>>>>>> something like this:
>>>>>>
>>>>>> if(task->task_state == TASK_STATE_ERROR)
>>>>>>          cmdq_task_handle_error(task)
>>>>>
>>>>> OK.
>>>>> I will write new function cmdq_task_handle_error() to handle error case.
>>>>>
>>>>>>>>> +		cmdq_thread_resume(thread);
>>>>>>>>> +		return -ECANCELED;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>
>>>>>>>> if task_state != ERROR and != DONE. This means that the timeout of
>>>>>>>> task_release_wq has timed out, right?
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>>> +	/*
>>>>>>>>> +	 * Save next_task and prev_task in advance
>>>>>>>>> +	 * since cmdq_handle_error_done will remove list_entry.
>>>>>>>>> +	 */
>>>>>>>>> +	next_task = prev_task = NULL;
>>>>>>>>> +	if (task->list_entry.next != &thread->task_busy_list)
>>>>>>>>> +		next_task = list_next_entry(task, list_entry);
>>>>>>>>> +	if (task->list_entry.prev != &thread->task_busy_list)
>>>>>>>>> +		prev_task = list_prev_entry(task, list_entry);
>>>>>>>>> +
>>>>>>>>> +	/*
>>>>>>>>> +	 * Although IRQ is disabled, GCE continues to execute.
>>>>>>>>> +	 * It may have pending IRQ before GCE thread is suspended,
>>>>>>>>> +	 * so check this condition again.
>>>>>>>>> +	 */
>>>>>>>>
>>>>>>>> The first thing we did in this function was suspending the thread. Why
>>>>>>>> do we need this then?
>>>>>>>
>>>>>>> Because timeout is CPU timeout not GCE timeout, GCE could just finish
>>>>>>> this task before the GCE thread is suspended.
>>>>>>>
>>>>>>
>>>>>> What are the reasons for a timeout? An error has happend, or the task is
>>>>>> still executing.
>>>>>
>>>>>    From GCE's point of view, this task is still executing.
>>>>> But, it could be an error of client.
>>>>> For example, task may never get event if display turn off hardware
>>>>> before waiting for task to finish its work.
>>>>>
>>>>>>>> To be honest this whole functions looks really like a design error. We
>>>>>>>> have to sperate the states much clearer so that it is possible to
>>>>>>>> understand what is happening in the GCE. Isn't it for example posible to
>>>>>>>> have worker queues for timed out tasks and tasks with an error? I'm not
>>>>>>>> sure how to do this, actually I'm not sure if I really understood how
>>>>>>>> this is supposed to work.
>>>>>>>
>>>>>>> GCE doesn't have timeout. The timeout is decided and controlled by CPU,
>>>>>>> so we check timeout in release work.
>>>>>>> For error and done, they are easy to check by register, and we have
>>>>>>> already created release work for timeout. So, I don't think we need to
>>>>>>> create work queue for each case.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>
>>>>>> I think, if we find in here, that the irq_flag is set, then the the
>>>>>> interrupt handler was triggered and is spinning the spinlock. If this is
>>>>>> not the case, we have a timeout and we handle this.
>>>>>
>>>>> I will write another function to handle error, and handle timeout here.
>>>>>
>>>>>>>> How much do we win, when we patch the thread command queue for every
>>>>>>>> task we add, instead of just taking one task after another from the
>>>>>>>> task_busy_list?
>>>>>>>
>>>>>>> GCE is used to help read/write registers with critical time limitation.
>>>>>>> Sometimes, client may ask to process multiple tasks in a short period
>>>>>>> of time, e.g. display flush multiple tasks for next vblank. So, CMDQ
>>>>>>> shouldn't limit to process one task after another from the
>>>>>>> task_busy_list. Currently, when interrupt or timeout, we will check
>>>>>>> how many tasks are done, and which one is error or timeout.
>>>>>>>
>>>>>>
>>>>>> So I suppose the device driver who use this are interested in throughput
>>>>>> and not in latency. The callback of every
>>>>>>
>>>>>>>>> +	irq_flag = cmdq_thread_readl(thread, CMDQ_THR_IRQ_STATUS);
>>>>>>>>> +	cmdq_handle_error_done(cmdq, thread, irq_flag);
>>>>>>>>> +	cmdq_thread_writel(thread, ~irq_flag, CMDQ_THR_IRQ_STATUS);
>>>>>>>>> +
>>>>>>>>> +	if (task->task_state == TASK_STATE_DONE) {
>>>>>>>>> +		cmdq_thread_resume(thread);
>>>>>>>>> +		return 0;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	if (task->task_state == TASK_STATE_ERROR) {
>>>>>>>>> +		dev_err(dev, "task 0x%p error\n", task);
>>>>>>>>> +		if (next_task) /* move to next task */
>>>>>>>>> +			cmdq_thread_writel(thread, next_task->pa_base,
>>>>>>>>> +					   CMDQ_THR_CURR_ADDR);
>>>>>>>>> +		cmdq_thread_resume(thread);
>>>>>>>>> +		return -ECANCELED;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	/* Task is running, so we force to remove it. */
>>>>>>>>> +	dev_err(dev, "task 0x%p timeout or killed\n", task);
>>>>>>>>> +	task->task_state = TASK_STATE_ERROR;
>>>>>>>>> +
>>>>>>>>> +	if (prev_task) {
>>>>>>>>> +		u64 *prev_va = prev_task->va_base;
>>>>>>>>> +		u64 *curr_va = task->va_base;
>>>>>>>>> +
>>>>>>>>> +		/* copy JUMP instruction */
>>>>>>>>> +		prev_va[prev_task->num_cmd - 1] = curr_va[task->num_cmd - 1];
>>>>>>>>> +
>>>>>>>>> +		cmdq_thread_invalidate_fetched_data(thread);
>>>>>>>>> +	} else if (next_task) { /* move to next task */
>>>>>>>>> +		cmdq_thread_writel(thread, next_task->pa_base,
>>>>>>>>> +				   CMDQ_THR_CURR_ADDR);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	list_del(&task->list_entry);
>>>>>>>>> +	cmdq_thread_resume(thread);
>>>>>>>>> +
>>>>>>>>> +	/* call cb here to prevent lock */
>>>>>>>>> +	if (task->cb.cb) {
>>>>>>>>> +		struct cmdq_cb_data cmdq_cb_data;
>>>>>>>>> +
>>>>>>>>> +		cmdq_cb_data.err = true;
>>>>>>>>> +		cmdq_cb_data.data = task->cb.data;
>>>>>>>>> +		task->cb.cb(cmdq_cb_data);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	return -ECANCELED;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_task_wait_release_work(struct work_struct *work_item)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_task *task = container_of(work_item, struct cmdq_task,
>>>>>>>>> +					      release_work);
>>>>>>>>> +	struct cmdq *cmdq = task->cmdq;
>>>>>>>>> +	struct cmdq_thread *thread = task->thread;
>>>>>>>>> +	unsigned long flags;
>>>>>>>>> +
>>>>>>>>> +	wait_event_timeout(thread->wait_task_done,
>>>>>>>>> +			   task->task_state != TASK_STATE_BUSY,
>>>>>>>>> +			   msecs_to_jiffies(CMDQ_TIMEOUT_MS));
>>>>>>>>> +
>>>>>>>>> +	spin_lock_irqsave(&cmdq->exec_lock, flags);
>>>>>>>>> +	if (task->task_state != TASK_STATE_DONE)
>>>>>>>>> +		cmdq_task_handle_error_result(task);
>>>>>>>>> +	if (list_empty(&thread->task_busy_list))
>>>>>>>>> +		cmdq_thread_disable(cmdq, thread);
>>>>>>>>> +	spin_unlock_irqrestore(&cmdq->exec_lock, flags);
>>>>>>>>> +
>>>>>>>>> +	/* release regardless of success or not */
>>>>>>>>> +	clk_disable_unprepare(cmdq->clock);
>>>>>>>>> +	cmdq_task_release(task);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void cmdq_task_wait_release_schedule(struct cmdq_task *task)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = task->cmdq;
>>>>>>>>> +
>>>>>>>>> +	INIT_WORK(&task->release_work, cmdq_task_wait_release_work);
>>>>>>>>> +	queue_work(cmdq->task_release_wq, &task->release_work);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_rec_realloc_cmd_buffer(struct cmdq_rec *rec, size_t size)
>>>>>>>>> +{
>>>>>>>>> +	void *new_buf;
>>>>>>>>> +
>>>>>>>>> +	new_buf = krealloc(rec->buf, size, GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +	if (!new_buf)
>>>>>>>>> +		return -ENOMEM;
>>>>>>>>> +	rec->buf = new_buf;
>>>>>>>>> +	rec->buf_size = size;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +struct cmdq_base *cmdq_register_device(struct device *dev)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_base *cmdq_base;
>>>>>>>>> +	struct resource res;
>>>>>>>>> +	int subsys;
>>>>>>>>> +	u32 base;
>>>>>>>>> +
>>>>>>>>> +	if (of_address_to_resource(dev->of_node, 0, &res))
>>>>>>>>> +		return NULL;
>>>>>>>>> +	base = (u32)res.start;
>>>>>>>>> +
>>>>>>>>> +	subsys = cmdq_subsys_base_to_id(base >> 16);
>>>>>>>>> +	if (subsys < 0)
>>>>>>>>> +		return NULL;
>>>>>>>>> +
>>>>>>>>> +	cmdq_base = devm_kmalloc(dev, sizeof(*cmdq_base), GFP_KERNEL);
>>>>>>>>> +	if (!cmdq_base)
>>>>>>>>> +		return NULL;
>>>>>>>>> +	cmdq_base->subsys = subsys;
>>>>>>>>> +	cmdq_base->base = base;
>>>>>>>>> +
>>>>>>>>> +	return cmdq_base;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_register_device);
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_create(struct device *dev, u64 engine_flag,
>>>>>>>>> +		    struct cmdq_rec **rec_ptr)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_rec *rec;
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	rec = kzalloc(sizeof(*rec), GFP_KERNEL);
>>>>>>>>> +	if (!rec)
>>>>>>>>> +		return -ENOMEM;
>>>>>>>>> +	rec->cmdq = dev_get_drvdata(dev);
>>>>>>>>> +	rec->engine_flag = engine_flag;
>>>>>>>>> +	err = cmdq_rec_realloc_cmd_buffer(rec, CMDQ_INITIAL_CMD_BLOCK_SIZE);
>>>>>>>>
>>>>>>>> Just pass PAGE_SIZE here, no need for CMDQ_INITIAL_CMD_BLOCK_SIZE.
>>>>>>>
>>>>>>> Will do.
>>>>>>>
>>>>>>>>> +	if (err < 0) {
>>>>>>>>> +		kfree(rec);
>>>>>>>>> +		return err;
>>>>>>>>> +	}
>>>>>>>>> +	*rec_ptr = rec;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_create);
>>>>>>>>> +
>>>>>>>>> +static int cmdq_rec_append_command(struct cmdq_rec *rec, enum cmdq_code code,
>>>>>>>>> +				   u32 arg_a, u32 arg_b)
>>>>>>>>> +{
>>>>>>>>> +	u64 *cmd_ptr;
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	if (WARN_ON(rec->finalized))
>>>>>>>>> +		return -EBUSY;
>>>>>>>>> +	if (code < CMDQ_CODE_MOVE || code > CMDQ_CODE_EOC)
>>>>>>>>> +		return -EINVAL;
>>>>>>>>
>>>>>>>> cmdq_rec_append_command is just called from inside this driver and code
>>>>>>>> is a enum. We can expect it to be correct, no need for this check.
>>>>>>>
>>>>>>> Will drop this check.
>>>>>>>
>>>>>>>>> +	if (unlikely(rec->command_size + CMDQ_INST_SIZE > rec->buf_size)) {
>>>>>>>>
>>>>>>>> command_size is the offset into the buffer to which a new command is
>>>>>>>> written, so this name is highly confusing. I wonder if this would be
>>>>>>>> easier to understand if we redefine command_size to something like the
>>>>>>>> number of commands and divide/multiply CMDQ_INST_SIZE where this is needed.
>>>>>>>
>>>>>>> I can rename command_size to cmd_buf_size and calculate num_cmd by
>>>>>>> dividing CMDQ_INST_SIZE.
>>>>>>> What do you think?
>>>>>>>
>>>>>>>>> +		err = cmdq_rec_realloc_cmd_buffer(rec, rec->buf_size * 2);
>>>>>>>>> +		if (err < 0)
>>>>>>>>> +			return err;
>>>>>>>>> +	}
>>>>>>>>> +	cmd_ptr = rec->buf + rec->command_size;
>>>>>>>>> +	(*cmd_ptr) = (u64)((code << CMDQ_OP_CODE_SHIFT) | arg_a) << 32 | arg_b;
>>>>>>>>> +	rec->command_size += CMDQ_INST_SIZE;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_write(struct cmdq_rec *rec, u32 value, struct cmdq_base *base,
>>>>>>>>> +		   u32 offset)
>>>>>>>>> +{
>>>>>>>>> +	u32 arg_a = ((base->base + offset) & CMDQ_ARG_A_WRITE_MASK) |
>>>>>>>>> +		    ((base->subsys & CMDQ_SUBSYS_MASK) << CMDQ_SUBSYS_SHIFT);
>>>>>>>>
>>>>>>>> base->subsys is the id from gce_sybsys, so we can expect it to be
>>>>>>>> correct, no need to mask with CMDQ_SUBSYS_MASK.
>>>>>>>
>>>>>>> Will drop it.
>>>>>>>
>>>>>>>>> +	return cmdq_rec_append_command(rec, CMDQ_CODE_WRITE, arg_a, value);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_write);
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_write_mask(struct cmdq_rec *rec, u32 value,
>>>>>>>>> +			struct cmdq_base *base, u32 offset, u32 mask)
>>>>>>>>> +{
>>>>>>>>> +	u32 offset_mask = offset;
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	if (mask != 0xffffffff) {
>>>>>>>>> +		err = cmdq_rec_append_command(rec, CMDQ_CODE_MOVE, 0, ~mask);
>>>>>>>>> +		if (err < 0)
>>>>>>>>> +			return err;
>>>>>>>>> +		offset_mask |= CMDQ_WRITE_ENABLE_MASK;
>>>>>>>>> +	}
>>>>>>>>> +	return cmdq_rec_write(rec, value, base, offset_mask);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_write_mask);
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_wfe(struct cmdq_rec *rec, enum cmdq_event event)
>>>>>>>>> +{
>>>>>>>>> +	u32 arg_b;
>>>>>>>>> +
>>>>>>>>> +	if (event >= CMDQ_MAX_HW_EVENT_COUNT || event < 0)
>>>>>>>>> +		return -EINVAL;
>>>>>>>>> +
>>>>>>>>> +	/*
>>>>>>>>> +	 * bit 0-11: wait value
>>>>>>>>> +	 * bit 15: 1 - wait, 0 - no wait
>>>>>>>>> +	 * bit 16-27: update value
>>>>>>>>> +	 * bit 31: 1 - update, 0 - no update
>>>>>>>>> +	 */
>>>>>>>>
>>>>>>>> I don't understand this comments. What are they for?
>>>>>>>
>>>>>>> This is for WFE command. I will comment it.
>>>>>>>
>>>>>>>>> +	arg_b = CMDQ_WFE_UPDATE | CMDQ_WFE_WAIT | CMDQ_WFE_WAIT_VALUE;
>>>>>>>>> +	return cmdq_rec_append_command(rec, CMDQ_CODE_WFE, event, arg_b);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_wfe);
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_clear_event(struct cmdq_rec *rec, enum cmdq_event event)
>>>>>>>>> +{
>>>>>>>>> +	if (event >= CMDQ_MAX_HW_EVENT_COUNT || event < 0)
>>>>>>>>> +		return -EINVAL;
>>>>>>>>> +
>>>>>>>>> +	return cmdq_rec_append_command(rec, CMDQ_CODE_WFE, event,
>>>>>>>>> +				       CMDQ_WFE_UPDATE);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_clear_event);
>>>>>>>>> +
>>>>>>>>> +static int cmdq_rec_finalize(struct cmdq_rec *rec)
>>>>>>>>> +{
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	if (rec->finalized)
>>>>>>>>> +		return 0;
>>>>>>>>> +
>>>>>>>>> +	/* insert EOC and generate IRQ for each command iteration */
>>>>>>>>> +	err = cmdq_rec_append_command(rec, CMDQ_CODE_EOC, 0, CMDQ_EOC_IRQ_EN);
>>>>>>>>> +	if (err < 0)
>>>>>>>>> +		return err;
>>>>>>>>> +
>>>>>>>>> +	/* JUMP to end */
>>>>>>>>> +	err = cmdq_rec_append_command(rec, CMDQ_CODE_JUMP, 0, CMDQ_JUMP_PASS);
>>>>>>>>> +	if (err < 0)
>>>>>>>>> +		return err;
>>>>>>>>> +
>>>>>>>>
>>>>>>>> Does this need to be atomic?
>>>>>>>> What happens if after CODE_EOC and before CODE_JUMP some
>>>>>>>> write/read/event gets added?
>>>>>>>> What happens if more commands get added to the queue after CODE_JUMP,
>>>>>>>> but before finalized is set to true. Why don't you use atomic functions
>>>>>>>> to access finalized?
>>>>>>>
>>>>>>> Since cmdq_rec doesn't guarantee thread safe, mutex is needed when
>>>>>>> client uses cmdq_rec.
>>>>>>>
>>>>>>
>>>>>> Well I think that rec->finalized tries to implement this, but might
>>>>>> fail, if two kernel threads work on the same cmdq_rec.
>>>>>>
>>>>>>>>> +	rec->finalized = true;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_flush_async(struct cmdq_rec *rec, cmdq_async_flush_cb cb,
>>>>>>>>> +			 void *data)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = rec->cmdq;
>>>>>>>>> +	struct cmdq_task *task;
>>>>>>>>> +	struct cmdq_task_cb task_cb;
>>>>>>>>> +	struct cmdq_thread *thread;
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	mutex_lock(&cmdq->task_mutex);
>>>>>>>>> +	if (cmdq->suspended) {
>>>>>>>>> +		dev_err(cmdq->dev, "%s is called after suspended\n", __func__);
>>>>>>>>> +		mutex_unlock(&cmdq->task_mutex);
>>>>>>>>> +		return -EPERM;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	err = cmdq_rec_finalize(rec);
>>>>>>>>> +	if (err < 0) {
>>>>>>>>> +		mutex_unlock(&cmdq->task_mutex);
>>>>>>>>> +		return err;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	task_cb.cb = cb;
>>>>>>>>> +	task_cb.data = data;
>>>>>>>>> +	task = cmdq_task_acquire(rec, task_cb);
>>>>>>>>> +	if (!task) {
>>>>>>>>> +		mutex_unlock(&cmdq->task_mutex);
>>>>>>>>> +		return -EFAULT;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	thread = &cmdq->thread[cmdq_eng_get_thread(task->engine_flag)];
>>>>>>>>> +	cmdq_task_exec(task, thread);
>>>>>>>>> +	cmdq_task_wait_release_schedule(task);
>>>>>>>>> +	mutex_unlock(&cmdq->task_mutex);
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_flush_async);
>>>>>>>>> +
>>>>>>>>> +struct cmdq_flush_completion {
>>>>>>>>> +	struct completion cmplt;
>>>>>>>>> +	bool err;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static int cmdq_rec_flush_cb(struct cmdq_cb_data data)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_flush_completion *cmplt = data.data;
>>>>>>>>> +
>>>>>>>>> +	cmplt->err = data.err;
>>>>>>>>> +	complete(&cmplt->cmplt);
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +int cmdq_rec_flush(struct cmdq_rec *rec)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_flush_completion cmplt;
>>>>>>>>> +	int err;
>>>>>>>>> +
>>>>>>>>> +	init_completion(&cmplt.cmplt);
>>>>>>>>> +	err = cmdq_rec_flush_async(rec, cmdq_rec_flush_cb, &cmplt);
>>>>>>>>> +	if (err < 0)
>>>>>>>>> +		return err;
>>>>>>>>> +	wait_for_completion(&cmplt.cmplt);
>>>>>>>>> +	return cmplt.err ? -EFAULT : 0;
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_flush);
>>>>>>>>> +
>>>>>>>>> +void cmdq_rec_destroy(struct cmdq_rec *rec)
>>>>>>>>> +{
>>>>>>>>> +	kfree(rec->buf);
>>>>>>>>> +	kfree(rec);
>>>>>>>>> +}
>>>>>>>>> +EXPORT_SYMBOL(cmdq_rec_destroy);
>>>>>>>>> +
>>>>>>>>> +static bool cmdq_task_is_empty(struct cmdq *cmdq)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq_thread *thread;
>>>>>>>>> +	int i;
>>>>>>>>> +
>>>>>>>>> +	for (i = 0; i < ARRAY_SIZE(cmdq->thread); i++) {
>>>>>>>>> +		thread = &cmdq->thread[i];
>>>>>>>>> +		if (!list_empty(&thread->task_busy_list))
>>>>>>>>> +			return false;
>>>>>>>>> +	}
>>>>>>>>> +	return true;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_suspend(struct device *dev)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = dev_get_drvdata(dev);
>>>>>>>>> +	u32 exec_threads;
>>>>>>>>> +
>>>>>>>>> +	mutex_lock(&cmdq->task_mutex);
>>>>>>>>> +	cmdq->suspended = true;
>>>>>>>>> +	mutex_unlock(&cmdq->task_mutex);
>>>>>>>>> +
>>>>>>>>> +	exec_threads = readl(cmdq->base + CMDQ_CURR_LOADED_THR);
>>>>>>>>> +	if ((exec_threads & CMDQ_THR_EXECUTING) && !cmdq_task_is_empty(cmdq)) {
>>>>>>>>> +		dev_err(dev, "wait active tasks timeout.\n");
>>>>>>>>> +		flush_workqueue(cmdq->task_release_wq);
>>>>>>>>> +	}
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_resume(struct device *dev)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = dev_get_drvdata(dev);
>>>>>>>>> +
>>>>>>>>> +	cmdq->suspended = false;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_remove(struct platform_device *pdev)
>>>>>>>>> +{
>>>>>>>>> +	struct cmdq *cmdq = platform_get_drvdata(pdev);
>>>>>>>>> +
>>>>>>>>> +	destroy_workqueue(cmdq->task_release_wq);
>>>>>>>>> +	cmdq->task_release_wq = NULL;
>>>>>>>>> +	return 0;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static int cmdq_probe(struct platform_device *pdev)
>>>>>>>>> +{
>>>>>>>>> +	struct device *dev = &pdev->dev;
>>>>>>>>> +	struct device_node *node = dev->of_node;
>>>>>>>>> +	struct resource *res;
>>>>>>>>> +	struct cmdq *cmdq;
>>>>>>>>> +	int err, i;
>>>>>>>>> +
>>>>>>>>> +	cmdq = devm_kzalloc(dev, sizeof(*cmdq), GFP_KERNEL);
>>>>>>>>> +	if (!cmdq)
>>>>>>>>> +		return -ENOMEM;
>>>>>>>>> +	cmdq->dev = dev;
>>>>>>>>> +
>>>>>>>>> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>>>>>>>>> +	cmdq->base = devm_ioremap_resource(dev, res);
>>>>>>>>> +	if (IS_ERR(cmdq->base)) {
>>>>>>>>> +		dev_err(dev, "failed to ioremap gce\n");
>>>>>>>>> +		return PTR_ERR(cmdq->base);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	cmdq->irq = irq_of_parse_and_map(node, 0);
>>>>>>>>> +	if (!cmdq->irq) {
>>>>>>>>> +		dev_err(dev, "failed to get irq\n");
>>>>>>>>> +		return -EINVAL;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	dev_dbg(dev, "cmdq device: addr:0x%p, va:0x%p, irq:%d\n",
>>>>>>>>> +		dev, cmdq->base, cmdq->irq);
>>>>>>>>> +
>>>>>>>>> +	mutex_init(&cmdq->task_mutex);
>>>>>>>>> +	spin_lock_init(&cmdq->exec_lock);
>>>>>>>>> +	cmdq->task_release_wq = alloc_ordered_workqueue(
>>>>>>>>> +			"%s", WQ_MEM_RECLAIM | WQ_HIGHPRI,
>>>>>>>>> +			"cmdq_task_wait_release");
>>>>>>>>> +
>>>>>>>>> +	for (i = 0; i < ARRAY_SIZE(cmdq->thread); i++) {
>>>>>>>>> +		cmdq->thread[i].base = cmdq->base + CMDQ_THR_BASE +
>>>>>>>>> +				CMDQ_THR_SHIFT * i;
>>>>>>>>> +		init_waitqueue_head(&cmdq->thread[i].wait_task_done);
>>>>>>>>> +		INIT_LIST_HEAD(&cmdq->thread[i].task_busy_list);
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	platform_set_drvdata(pdev, cmdq);
>>>>>>>>> +
>>>>>>>>> +	err = devm_request_irq(dev, cmdq->irq, cmdq_irq_handler, IRQF_SHARED,
>>>>>>>>> +			       CMDQ_DRIVER_DEVICE_NAME, cmdq);
>>>>>>>>> +	if (err < 0) {
>>>>>>>>> +		dev_err(dev, "failed to register ISR (%d)\n", err);
>>>>>>>>> +		goto fail;
>>>>>>>>> +	}
>>>>>>>>> +
>>>>>>>>> +	cmdq->clock = devm_clk_get(dev, CMDQ_CLK_NAME);
>>>>>>>>> +	if (IS_ERR(cmdq->clock)) {
>>>>>>>>> +		dev_err(dev, "failed to get clk:%s\n", CMDQ_CLK_NAME);
>>>>>>>>> +		err = PTR_ERR(cmdq->clock);
>>>>>>>>> +		goto fail;
>>>>>>>>> +	}
>>>>>>>>> +	return 0;
>>>>>>>>> +
>>>>>>>>> +fail:
>>>>>>>>> +	cmdq_remove(pdev);
>>>>>>>>> +	return err;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static const struct dev_pm_ops cmdq_pm_ops = {
>>>>>>>>> +	.suspend = cmdq_suspend,
>>>>>>>>> +	.resume = cmdq_resume,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static const struct of_device_id cmdq_of_ids[] = {
>>>>>>>>> +	{.compatible = "mediatek,mt8173-gce",},
>>>>>>>>> +	{}
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static struct platform_driver cmdq_drv = {
>>>>>>>>> +	.probe = cmdq_probe,
>>>>>>>>> +	.remove = cmdq_remove,
>>>>>>>>> +	.driver = {
>>>>>>>>> +		.name = CMDQ_DRIVER_DEVICE_NAME,
>>>>>>>>> +		.owner = THIS_MODULE,
>>>>>>>>> +		.pm = &cmdq_pm_ops,
>>>>>>>>> +		.of_match_table = cmdq_of_ids,
>>>>>>>>> +	}
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +builtin_platform_driver(cmdq_drv);
>>>>>>>>> diff --git a/include/soc/mediatek/cmdq.h b/include/soc/mediatek/cmdq.h
>>>>>>>>> new file mode 100644
>>>>>>>>> index 0000000..60eef3d
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/include/soc/mediatek/cmdq.h
>>>>>>>>> @@ -0,0 +1,197 @@
>>>>>>>>> +/*
>>>>>>>>> + * Copyright (c) 2015 MediaTek Inc.
>>>>>>>>> + *
>>>>>>>>> + * This program is free software; you can redistribute it and/or modify
>>>>>>>>> + * it under the terms of the GNU General Public License version 2 as
>>>>>>>>> + * published by the Free Software Foundation.
>>>>>>>>> + *
>>>>>>>>> + * This program is distributed in the hope that it will be useful,
>>>>>>>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>>>>>>> + * GNU General Public License for more details.
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +#ifndef __MTK_CMDQ_H__
>>>>>>>>> +#define __MTK_CMDQ_H__
>>>>>>>>> +
>>>>>>>>> +#include <linux/platform_device.h>
>>>>>>>>> +#include <linux/types.h>
>>>>>>>>> +
>>>>>>>>> +enum cmdq_eng {
>>>>>>>>> +	CMDQ_ENG_DISP_AAL,
>>>>>>>>> +	CMDQ_ENG_DISP_COLOR0,
>>>>>>>>> +	CMDQ_ENG_DISP_COLOR1,
>>>>>>>>> +	CMDQ_ENG_DISP_DPI0,
>>>>>>>>> +	CMDQ_ENG_DISP_DSI0,
>>>>>>>>> +	CMDQ_ENG_DISP_DSI1,
>>>>>>>>> +	CMDQ_ENG_DISP_GAMMA,
>>>>>>>>> +	CMDQ_ENG_DISP_OD,
>>>>>>>>> +	CMDQ_ENG_DISP_OVL0,
>>>>>>>>> +	CMDQ_ENG_DISP_OVL1,
>>>>>>>>> +	CMDQ_ENG_DISP_PWM0,
>>>>>>>>> +	CMDQ_ENG_DISP_PWM1,
>>>>>>>>> +	CMDQ_ENG_DISP_RDMA0,
>>>>>>>>> +	CMDQ_ENG_DISP_RDMA1,
>>>>>>>>> +	CMDQ_ENG_DISP_RDMA2,
>>>>>>>>> +	CMDQ_ENG_DISP_UFOE,
>>>>>>>>> +	CMDQ_ENG_DISP_WDMA0,
>>>>>>>>> +	CMDQ_ENG_DISP_WDMA1,
>>>>>>>>> +	CMDQ_ENG_MAX,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/* events for CMDQ and display */
>>>>>>>>> +enum cmdq_event {
>>>>>>>>> +	/* Display start of frame(SOF) events */
>>>>>>>>> +	CMDQ_EVENT_DISP_OVL0_SOF = 11,
>>>>>>>>> +	CMDQ_EVENT_DISP_OVL1_SOF = 12,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA0_SOF = 13,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA1_SOF = 14,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA2_SOF = 15,
>>>>>>>>> +	CMDQ_EVENT_DISP_WDMA0_SOF = 16,
>>>>>>>>> +	CMDQ_EVENT_DISP_WDMA1_SOF = 17,
>>>>>>>>> +	/* Display end of frame(EOF) events */
>>>>>>>>> +	CMDQ_EVENT_DISP_OVL0_EOF = 39,
>>>>>>>>> +	CMDQ_EVENT_DISP_OVL1_EOF = 40,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA0_EOF = 41,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA1_EOF = 42,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA2_EOF = 43,
>>>>>>>>> +	CMDQ_EVENT_DISP_WDMA0_EOF = 44,
>>>>>>>>> +	CMDQ_EVENT_DISP_WDMA1_EOF = 45,
>>>>>>>>> +	/* Mutex end of frame(EOF) events */
>>>>>>>>> +	CMDQ_EVENT_MUTEX0_STREAM_EOF = 53,
>>>>>>>>> +	CMDQ_EVENT_MUTEX1_STREAM_EOF = 54,
>>>>>>>>> +	CMDQ_EVENT_MUTEX2_STREAM_EOF = 55,
>>>>>>>>> +	CMDQ_EVENT_MUTEX3_STREAM_EOF = 56,
>>>>>>>>> +	CMDQ_EVENT_MUTEX4_STREAM_EOF = 57,
>>>>>>>>> +	/* Display underrun events */
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA0_UNDERRUN = 63,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA1_UNDERRUN = 64,
>>>>>>>>> +	CMDQ_EVENT_DISP_RDMA2_UNDERRUN = 65,
>>>>>>>>> +	/* Keep this at the end of HW events */
>>>>>>>>> +	CMDQ_MAX_HW_EVENT_COUNT = 260,
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +struct cmdq_cb_data {
>>>>>>>>> +	bool	err;
>>>>>>>>> +	void	*data;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +typedef int (*cmdq_async_flush_cb)(struct cmdq_cb_data data);
>>>>>>>>> +
>>>>>>>>> +struct cmdq_task;
>>>>>>>>> +struct cmdq;
>>>>>>>>> +
>>>>>>>>> +struct cmdq_rec {
>>>>>>>>> +	struct cmdq		*cmdq;
>>>>>>>>> +	u64			engine_flag;
>>>>>>>>> +	size_t			command_size;
>>>>>>>>> +	void			*buf;
>>>>>>>>> +	size_t			buf_size;
>>>>>>>>> +	bool			finalized;
>>>>>>>>> +};
>>>>>>
>>>>>> Why do we need cmdq_rec at all? Can't we just use the cmdq_task for that
>>>>>> and this way make the driver less complex?
>>>>>
>>>>> There are two main reasons for cmdq_rec.
>>>>> 1. It is slow to access dma too frequently.
>>>>>       So, we append commands to cacheable memory, and then flush to dma.
>>>>> 2. cmdq_rec is not thread safe, but cmdq_task needs thread safe.
>>>>>       If we merge them, we need to take care of some synchronization
>>>>>       issues.
>>>>>
>>>>>>>>> +
>>>>>>>>> +struct cmdq_base {
>>>>>>>>> +	int	subsys;
>>>>>>>>> +	u32	base;
>>>>>>>>
>>>>>>>> subsys can always be calculated via cmdq_subsys_base_to_id(base >> 16)
>>>>>>>> so we can get rid of the struct, right?
>>>>>>>
>>>>>>> Current subsys method is based on previous comment from Daniel Kurtz.
>>>>>>> Please take a look of our previous discussion.
>>>>>>> http://lists.infradead.org/pipermail/linux-mediatek/2016-February/004483.html
>>>>>>> Thanks.
>>>>>>>
>>>>>>
>>>>>> I have to look deeper into this, but from what I read, the proposal from
>>>>>> Daniel [1] seems good to me.
>>>>>>
>>>>>> [1] https://patchwork.kernel.org/patch/8068311/
>>>>>
>>>>> The initial proposal has some problem, so please see the follow-up
>>>>> discussions. Thanks.
>>>>> http://lists.infradead.org/pipermail/linux-mediatek/2016-February/003972.html
>>>>> http://lists.infradead.org/pipermail/linux-mediatek/2016-February/004483.html
>>>>>
>>>>>
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_register_device() - register device which needs CMDQ
>>>>>>>>> + * @dev:		device
>>>>>>>>> + *
>>>>>>>>> + * Return: cmdq_base pointer or NULL for failed
>>>>>>>>> + */
>>>>>>>>> +struct cmdq_base *cmdq_register_device(struct device *dev);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_create() - create command queue record
>>>>>>>>> + * @dev:		device
>>>>>>>>> + * @engine_flag:	command queue engine flag
>>>>>>>>> + * @rec_ptr:		command queue record pointer to retrieve cmdq_rec
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_create(struct device *dev, u64 engine_flag,
>>>>>>>>> +		    struct cmdq_rec **rec_ptr);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_write() - append write command to the command queue record
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + * @value:	the specified target register value
>>>>>>>>> + * @base:	the command queue base
>>>>>>>>> + * @offset:	register offset from module base
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_write(struct cmdq_rec *rec, u32 value,
>>>>>>>>> +		   struct cmdq_base *base, u32 offset);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_write_mask() - append write command with mask to the command
>>>>>>>>> + *			   queue record
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + * @value:	the specified target register value
>>>>>>>>> + * @base:	the command queue base
>>>>>>>>> + * @offset:	register offset from module base
>>>>>>>>> + * @mask:	the specified target register mask
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_write_mask(struct cmdq_rec *rec, u32 value,
>>>>>>>>> +			struct cmdq_base *base, u32 offset, u32 mask);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_wfe() - append wait for event command to the command queue reco	rd
>>>>>>>>
>>>>>>>> reco rd -> record
>>>>>>>
>>>>>>> Will fix it.
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Matthias
>>>>>>>>
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + * @event:	the desired event type to "wait and CLEAR"
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_wfe(struct cmdq_rec *rec, enum cmdq_event event);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_clear_event() - append clear event command to the command queue
>>>>>>>>> + *			    record
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + * @event:	the desired event to be cleared
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_clear_event(struct cmdq_rec *rec, enum cmdq_event event);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_flush() - trigger CMDQ to execute the recorded commands
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + *
>>>>>>>>> + * Trigger CMDQ to execute the recorded commands. Note that this is a
>>>>>>>>> + * synchronous flush function. When the function returned, the recorded
>>>>>>>>> + * commands have been done.
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_flush(struct cmdq_rec *rec);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_flush_async() - trigger CMDQ to asynchronously execute the recorded
>>>>>>>>> + *			    commands and call back after ISR is finished
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + * @cb:		called in the end of CMDQ ISR
>>>>>>>>> + * @data:	this data will pass back to cb
>>>>>>>>> + *
>>>>>>>>> + * Return: 0 for success; else the error code is returned
>>>>>>>>> + *
>>>>>>>>> + * Trigger CMDQ to asynchronously execute the recorded commands and call back
>>>>>>>>> + * after ISR is finished. Note that this is an ASYNC function. When the function
>>>>>>>>> + * returned, it may or may not be finished. The ISR callback function is called
>>>>>>>>> + * in the end of ISR.
>>>>>>
>>>>>> "The callback is called from the ISR."
>>>>>>
>>>>>> Regards,
>>>>>> Matthias
>>>>>>
>>>>>>>>> + */
>>>>>>>>> +int cmdq_rec_flush_async(struct cmdq_rec *rec, cmdq_async_flush_cb cb,
>>>>>>>>> +			 void *data);
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * cmdq_rec_destroy() - destroy command queue record
>>>>>>>>> + * @rec:	the command queue record
>>>>>>>>> + */
>>>>>>>>> +void cmdq_rec_destroy(struct cmdq_rec *rec);
>>>>>>>>> +
>>>>>>>>> +#endif	/* __MTK_CMDQ_H__ */
>>>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> HS
>>>>>>>
>>>>>
>>>>> Thanks,
>>>>> HS
>>>>>
>>>
>>>
>
>