[PATCH] media: hantro: Check whether reset op is defined before use

Sat Aug 26 14:44:54 PDT 2023

On 8/25/23 10:52, Chen-Yu Tsai wrote:
> On Fri, Aug 25, 2023 at 4:33 PM Marek Vasut <marex at denx.de> wrote:
>>
>> On 8/25/23 09:09, Chen-Yu Tsai wrote:
>>> On Thu, Aug 24, 2023 at 9:08 PM Marek Vasut <marex at denx.de> wrote:
>>>>
>>>> On 8/24/23 12:39, Adam Ford wrote:
>>>>> On Wed, Aug 23, 2023 at 8:39 PM Marek Vasut <marex at denx.de> wrote:
>>>>>>
>>>>>> The i.MX8MM/N/P does not define the .reset op since reset of the VPU is
>>>>>> done by genpd. Check whether the .reset op is defined before calling it
>>>>>> to avoid NULL pointer dereference.
>>>>>>
>>>>>> Note that the Fixes tag is set to the commit which removed the reset op
>>>>>> from i.MX8M Hantro G2 implementation, this is because before this commit
>>>>>> all the implementations did define the .reset op.
>>>>>
>>>>> I am surprised I didn't have issues when I was testing the 8MQ and
>>>>> 8MM, but this makes sense.
>>>>
>>>> You need to trigger the VPU watchdog to trigger the crash, that means
>>>> you have to get the VPU into some weird state where it fails to decode
>>>> frame. Then it triggers the reset and ... boom.
>>>>
>>>> See this patch, that contains a gstreamer invocation to generate such
>>>> trigger condition input data:
>>>>
>>>> [PATCH] media: verisilicon: Do not enable G2 postproc downscale if
>>>> source is narrower than destination
>>>>
>>>> "
>>>> To generate input test data to trigger this bug, use e.g.:
>>>> $ gst-launch-1.0 videotestsrc !
>>>> video/x-raw,width=272,height=256,format=I420 ! \
>>>>                     vp9enc ! matroskamux ! filesink location=/tmp/test.vp9
>>>> To trigger the bug upon decoding (note that the NV12 must be forced, as
>>>> that assures the output data would pass the G2 postproc):
>>>> $ gst-launch-1.0 filesrc location=/tmp/test.vp9 ! matroskademux !
>>>> vp9parse ! \
>>>>                     v4l2slvp9dec ! video/x-raw,format=NV12 ! videoconvert
>>>> ! fbdevsink
>>>> "
>>>
>>> Does it completely recover afterwards? In my previous trials the hardware
>>> ended up in some bizzare state: while decoding succeeds, the output's md5sum
>>> didn't match up.
>>
>> Have you got a testcase that triggers this, one I can try ?
>>
>> I am not entirely sure whether this is happening here as well or not,
>> but I can imagine that the power domain went down and back up between
>> tests, so the VPU would be power cycled (and therefore reset) that way.
>> So, I think it is worth testing that.
> 
> This was last year while I was writing HEVC decoding code for Chromium.
> IIRC the SAODBLK_A_MainConcept_4 test vector from the official HEVC test
> suite does cause our stack to crash, but Gstreamer seemed to handle it
> OK. It could be that the Chromium decoder stack is passing bad values to
> the decoder.

That can be easily tested with ftrace enabled. I was just tracking down 
an issue with gstreamer and added the following patch to the hantro 
driver. Then:

echo > /sys/kernel/debug/tracing/trace
<run fail test>
cat /sys/kernel/debug/tracing/trace > /tmp/fail.trace
echo > /sys/kernel/debug/tracing/trace
<run pass test>
cat /sys/kernel/debug/tracing/trace > /tmp/pass.trace
# remove time stamps etc.
diff /tmp/{fail,pass}.trace

You should see whether some register programming differs between 
gstreamer and chromium.

diff --git a/drivers/media/platform/verisilicon/hantro.h 
b/drivers/media/platform/verisilicon/hantro.h
index dea35a501ba30..529f1ab478ec8 100644
--- a/drivers/media/platform/verisilicon/hantro.h
+++ b/drivers/media/platform/verisilicon/hantro.h
@@ -353,8 +353,7 @@ extern int hantro_debug;

  #define vpu_debug(level, fmt, args...)                         \
         do {                                                    \
-               if (hantro_debug & BIT(level))          \
-                       pr_info("%s:%d: " fmt,                  \
+               trace_printk("%s:%d: " fmt,                     \
                                  __func__, __LINE__, ##args);   \
         } while (0)