Kernel oops in vc4_overflow_mem_work

Eric Anholt eric at anholt.net
Wed Nov 8 14:46:00 PST 2017


Stefan Schake <stschake at gmail.com> writes:

> Hey everyone,
>
> I wanted to follow up on an issue that was first reported here:
>
> https://github.com/anholt/linux/issues/114
>
> A patch was proposed here, but it doesn't appear to address the issue:
>
> https://patchwork.freedesktop.org/patch/181157/
>
> Looking at the stack trace it's a null pointer dereference at +0x88 so
> that didn't seem symptomatic of an uninitialized spin lock. On my
> kernel which occasionally showed this issue, the PC matched up with
> line 95 of vc4_irq.c:
>
>    V3D_WRITE(V3D_BPOA, bo->base.paddr + bin_bo_slot * vc4->bin_alloc_size);
>
> And more specifically, the bo->base.paddr expression indicating that
> bo (from vc4->bin_bo, the binner BO) is null when the problem occurs.
>
> There are only two places where bin_bo is specifically set to null,
> the first in line 964 of vc4_gem.c (vc4_gem_destroy). This is invoked
> when binding fails but has a comment stating that the interrupts
> should be disabled at this point.
>
> The second occurrence is in line 307 of vc4_v3d.c
> (vc4_v3d_runtime_suspend). This comes immediately after
> vc4_irq_uninstall, which first acks & disables all interrupts in the
> VC4 registers then synchronously cancels the overflow work queue (so
> all pending or running vc4_overflow_mem_work is completed before
> returning). The problem with this order is that since binner overflow
> interrupt handling is offloaded to a work queue, the worker
> vc4_overflow_mem_work will specifically enable the interrupt bit
> again. So in the rare case that there was pending overflow work during
> vc4_irq_uninstall, we will first disable the interrupt, then wait for
> the remaining work to complete - which will enable the interrupt
> again.

Nice catch.  Also, it looks like when vc4_irq_uninstall() is masking off
the interrupt, we might already be in the interrupt handler on another
CPU, for a similar race.

So, I think maybe we need to have the powerdown path do
disable_irq_sync(), then cancel_work_sync().
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-rpi-kernel/attachments/20171108/655c6c05/attachment.sig>


More information about the linux-rpi-kernel mailing list