[PATCH 3/3] arm64: dts: meson: Describe G12b GPU as coherent
Steven Price
steven.price at arm.com
Mon Oct 5 06:05:04 EDT 2020
On 05/10/2020 09:39, Boris Brezillon wrote:
> On Mon, 5 Oct 2020 09:34:06 +0100
> Steven Price <steven.price at arm.com> wrote:
>
>> On 05/10/2020 09:15, Boris Brezillon wrote:
>>> Hi Robin, Neil,
>>>
>>> On Wed, 16 Sep 2020 10:26:43 +0200
>>> Neil Armstrong <narmstrong at baylibre.com> wrote:
>>>
>>>> Hi Robin,
>>>>
>>>> On 16/09/2020 01:51, Robin Murphy wrote:
>>>>> According to a downstream commit I found in the Khadas vendor kernel,
>>>>> the GPU on G12b is wired up for ACE-lite, so (now that Panfrost knows
>>>>> how to handle this properly) we should describe it as such. Otherwise
>>>>> the mismatch leads to all manner of fun with mismatched attributes and
>>>>> inadvertently snooping stale data from caches, which would account for
>>>>> at least some of the brokenness observed on this platform.
>>>>>
>>>>> Signed-off-by: Robin Murphy <robin.murphy at arm.com>
>>>>> ---
>>>>> arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> index 9b8548e5f6e5..ee8fcae9f9f0 100644
>>>>> --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi
>>>>> @@ -135,3 +135,7 @@ map1 {
>>>>> };
>>>>> };
>>>>> };
>>>>> +
>>>>> +&mali {
>>>>> + dma-coherent;
>>>>> +};
>>>>>
>>>>
>>>> Thanks a lot for digging, I'll run a test to confirm it fixes the issue !
>>>
>>> Sorry for the late reply. I triggered a dEQP run with this patch applied
>>> and I see a bunch of "panfrost ffe40000.gpu: matching BO is not heap type"
>>> errors (see below for a full backtrace). That doesn't seem to happen when
>>> we drop this dma-coherent property.
>>>
>>> [ 690.945731] ------------[ cut here ]------------
>>> [ 690.950003] panfrost ffe40000.gpu: matching BO is not heap type (GPU VA = 319a000)
>>> [ 690.950051] WARNING: CPU: 0 PID: 120 at drivers/gpu/drm/panfrost/panfrost_mmu.c:465 panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [ 690.968854] Modules linked in:
>>> [ 690.971878] CPU: 0 PID: 120 Comm: irq/27-panfrost Tainted: G W 5.9.0-rc5-02434-g7d8109ec5a42 #784
>>> [ 690.981964] Hardware name: Khadas VIM3 (DT)
>>> [ 690.986107] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--)
>>> [ 690.991627] pc : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [ 690.997232] lr : panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [ 691.002836] sp : ffff800011bcbcd0
>>> [ 691.006114] x29: ffff800011bcbcf0 x28: ffff0000f3fe3800
>>> [ 691.011375] x27: ffff0000ceaf5350 x26: ffff0000ca5fc500
>>> [ 691.016636] x25: ffff0000f32409c0 x24: 0000000000000001
>>> [ 691.021897] x23: ffff0000f3240880 x22: ffff0000f3e3a800
>>> [ 691.027159] x21: 0000000000000000 x20: 0000000000000000
>>> [ 691.032420] x19: 0000000000010001 x18: 0000000000000020
>>> [ 691.037681] x17: 0000000000000000 x16: 0000000000000000
>>> [ 691.042942] x15: ffff0000f3fe3c70 x14: ffffffffffffffff
>>> [ 691.048204] x13: ffff8000116c2428 x12: ffff8000116c2086
>>> [ 691.053466] x11: ffff800011bcbcd0 x10: ffff800011bcbcd0
>>> [ 691.058727] x9 : 00000000fffffffe x8 : 0000000000000000
>>> [ 691.063988] x7 : 7420706165682074 x6 : ffff8000116c1816
>>> [ 691.069249] x5 : 0000000000000000 x4 : 0000000000000000
>>> [ 691.074510] x3 : 00000000ffffffff x2 : ffff8000e348c000
>>> [ 691.079771] x1 : f1b91ff9af2df000 x0 : 0000000000000000
>>> [ 691.085033] Call trace:
>>> [ 691.087452] panfrost_mmu_irq_handler_thread+0x47c/0x650
>>> [ 691.092712] irq_thread_fn+0x2c/0xa0
>>> [ 691.096246] irq_thread+0x184/0x208
>>> [ 691.099699] kthread+0x140/0x160
>>> [ 691.102890] ret_from_fork+0x10/0x34
>>> [ 691.106424] ---[ end trace b5dd8c2dfada8236 ]---
>>>
>>
>> It's quite possible this is caused by the GPU seeing a stale page table
>> entry, so perhaps coherency isn't working as well as it should...
>>
>> Do you get an "Unhandled Page fault" message after this?
>
> Yep (see below).
>
> --->8---
>
[...]
> [ 689.805864] panfrost ffe40000.gpu: Unhandled Page fault in AS0 at VA 0x0000000003146080
> [ 689.805864] Reason: TODO
> [ 689.805864] raw fault status: 0x10003C3
> [ 689.805864] decoded fault status: SLAVE FAULT
> [ 689.805864] exception type 0xC3: TRANSLATION_FAULT_LEVEL3
> [ 689.805864] access type 0x3: WRITE
> [ 689.805864] source id 0x100
> [ 690.170419] panfrost ffe40000.gpu: gpu sched timeout, js=1, config=0x7300, status=0x8, head=0x3101100, tail=0x3101100, sched_job=000000004b442768
> [ 690.770373] panfrost ffe40000.gpu: error powering up gpu shader
> [ 690.945123] panfrost ffe40000.gpu: error powering up gpu shader
> [ 690.945731] ------------[ cut here ]------------
>
That's a write fault from level 3 of the page table triggered by shader
core 0 in a fragment job. So could be writing out the frame buffer.
It would be interesting to see if a patch like below would work round
it:
----8<----
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index e8f7b11352d2..5144860afdea 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -460,9 +460,12 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
bo = bomapping->obj;
if (!bo->is_heap) {
- dev_WARN(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
+ dev_err(pfdev->dev, "matching BO is not heap type (GPU VA = %llx)",
bomapping->mmnode.start << PAGE_SHIFT);
- ret = -EINVAL;
+ /* Force a flush of the MMU to restart the transaction */
+ mmu_hw_do_operation(pfdev, bomapping->mmu, addr, 0,
+ AS_COMMAND_FLUSH_MEM);
+ ret = 0;
goto err_bo;
}
WARN_ON(bomapping->mmu->as != as);
---8<---
That's obviously not the correct solution (the fault shouldn't occur),
but if flushing the GPU's caches and retrying works then we know it's
a coherency issue. You can also try backing off from the big hammer of
AS_COMMAND_FLUSH_MEM and trying AS_COMMAND_FLUSH_PT or
AS_COMMAND_UNLOCK.
Steve
More information about the linux-amlogic
mailing list