[PATCH] media: hantro: Check whether reset op is defined before use

Marek Vasut marex at denx.de
Wed Aug 30 12:13:13 PDT 2023

On 8/30/23 05:38, Chen-Yu Tsai wrote:
> On Sun, Aug 27, 2023 at 5:44 AM Marek Vasut <marex at denx.de> wrote:
>> On 8/25/23 10:52, Chen-Yu Tsai wrote:
>>> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex at denx.de> wrote:
>>>> On 8/25/23 09:09, Chen-Yu Tsai wrote:
>>>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex at denx.de> wrote:
>>>>>> On 8/24/23 12:39, Adam Ford wrote:
>>>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex at denx.de> wrote:
>>>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is
>>>>>>>> done by genpd. Check whether the .reset op is defined before calling it
>>>>>>>> to avoid NULL pointer dereference.
>>>>>>>> Note that the Fixes tag is set to the commit which removed the reset op
>>>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit
>>>>>>>> all the implementations did define the .reset op.
>>>>>>> I am surprised I didn't have issues when I was testing the 8MQ and
>>>>>>> 8MM, but this makes sense.
>>>>>> You need to trigger the VPU watchdog to trigger the crash, that means
>>>>>> you have to get the VPU into some weird state where it fails to decode
>>>>>> frame. Then it triggers the reset and ... boom.
>>>>>> See this patch, that contains a gstreamer invocation to generate such
>>>>>> trigger condition input data:
>>>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if
>>>>>> source is narrower than destination
>>>>>> "
>>>>>> To generate input test data to trigger this bug, use e.g.:
>>>>>> $ gst-launch-1.0 videotestsrc !
>>>>>> video/x-raw,width=272,height=256,format=I420 ! \
>>>>>>                      vp9enc ! matroskamux ! filesink location=/tmp/test.vp9
>>>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as
>>>>>> that assures the output data would pass the G2 postproc):
>>>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux !
>>>>>> vp9parse ! \
>>>>>>                      v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert
>>>>>> ! fbdevsink
>>>>>> "
>>>>> Does it completely recover afterwards? In my previous trials the hardware
>>>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum
>>>>> didn't match up.
>>>> Have you got a testcase that triggers this, one I can try ?
>>>> I am not entirely sure whether this is happening here as well or not,
>>>> but I can imagine that the power domain went down and back up between
>>>> tests, so the VPU would be power cycled (and therefore reset) that way.
>>>> So, I think it is worth testing that.
>>> This was last year while I was writing HEVC decoding code for Chromium.
>>> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test
>>> suite does cause our stack to crash, but Gstreamer seemed to handle it
>>> OK. It could be that the Chromium decoder stack is passing bad values to
>>> the decoder.
>> That can be easily tested with ftrace enabled. I was just tracking down
>> an issue with gstreamer and added the following patch to the hantro
>> driver. Then:
>> echo > /sys/kernel/debug/tracing/trace
>> <run fail test>
>> cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace
>> echo > /sys/kernel/debug/tracing/trace
>> <run pass test>
>> cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace
>> # remove time stamps etc.
>> diff /tmp/{fail,pass}.trace
>> You should see whether some register programming differs between
>> gstreamer and chromium.
> I ended up using VISL to compare the controls set. I did find a bug.
> It still hard hangs after a couple frames, so I guess I'd need to use
> your method, but do printk instead.
> BTW, I wonder if we shouldn't add a reset op, if only just to stop the
> hardware? That is, do the same two register writes as in the Hantro G2
> interrupt handler.

You mean these two ?

38         vdpu_write(vpu, 0, G2_REG_INTERRUPT);
39         vdpu_write(vpu, G2_REG_CONFIG_DEC_CLK_GATE_E, G2_REG_CONFIG);

As far as I understand this, that only clears IRQ and gates the clock 
off, but it doesn't reset the IP state, does it ?

More information about the Linux-rockchip mailing list