[PATCH v2 4/9] drm/panthor: Implement optional reset
Boris Brezillon
boris.brezillon at collabora.com
Wed Sep 3 23:22:24 PDT 2025
Hello Marek,
On Wed, 3 Sep 2025 23:44:59 +0200
Marek Vasut <marek.vasut at mailbox.org> wrote:
> On 3/25/25 3:52 PM, Boris Brezillon wrote:
>
> Hello Boris,
>
> sorry for the late reply.
>
> >>>>>>> Hm, that might be the cause of the fast reset issue (which is a fast
> >>>>>>> resume more than a fast reset BTW): if you re-assert the reset line on
> >>>>>>> runtime suspend, I guess this causes a full GPU reset, and the MCU ends
> >>>>>>> up in a state where it needs a slow reset (all data sections reset to
> >>>>>>> their initial state). Can you try to move the reset_control_[de]assert
> >>>>>>> to the unplug/init functions?
> >>>>>> Is it correct to assume , that if I remove all reset_control_assert()
> >>>>>> calls (and keep only the _deassert() calls), the slow resume problem
> >>>>>> should go away too ?
> >>>>>
> >>>>> Yeah, dropping the _assert()s should do the trick.
> >>>> Hmmm, no, that does not help. I was hoping maybe NXP can chime in and
> >>>> suggest something too ?
> >>>
> >>> Can you try keep all the clks/regulators/power-domains/... on after
> >>> init, and see if the fast resume works with that. If it does,
> >>> re-introduce one resource at a time to find out which one causes the
> >>> MCU to lose its state.
> >>
> >> I already tried that too . I spent quite a while until I reached that L2
> >> workaround in fact.
> >
> > So, with your RPM suspend/resume being NOPs, it still doesn't work?
> > Unless the FW is doing something behind our back, I don't really see
> > why this would fail on your platform, but not on the rk3588. Are you
> > sure the power domains are kept on at all times. I'm asking, because if
> > you linked all the PDs, the on/off sequence is automatically handled by
> > the RPM core at suspend/resume time.
>
> I revisited this now.
>
> Can you please test the following patch (also attached) on one of your
> devices, and tell me what the status is at the end . The diff sets the
> GLB_HALT bit and then clears it again, which I suspect should first halt
> the GPU and (this is what I am unsure about) then again un-halt/resume
> the GPU ?
It doesn't work like that. What you're describing is like executing
"shutdown" on your terminal and then typing "boot" on the keyboard
after your computer has been shut down.
>
> "
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c
> b/drivers/gpu/drm/panthor/panthor_fw.c
> index 9bf06e55eaeea..57c0d4fd29aa2 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1087,8 +1087,16 @@ void panthor_fw_pre_reset(struct panthor_device
> *ptdev, bool on_hang)
> struct panthor_fw_global_iface *glb_iface =
> panthor_fw_get_glb_iface(ptdev);
> u32 status;
>
> +pr_err("%s[%d] pre-halt status=%x\n", __func__, __LINE__,
> gpu_read(ptdev, MCU_STATUS));
> +
> panthor_fw_update_reqs(glb_iface, req, GLB_HALT, GLB_HALT);
> gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
> +mdelay(100);
> +pr_err("%s[%d] likely-halted status=%x\n", __func__, __LINE__,
> gpu_read(ptdev, MCU_STATUS));
> + panthor_fw_update_reqs(glb_iface, req, 0, GLB_HALT);
> +mdelay(100);
> +pr_err("%s[%d] likely-running ? status=%x\n", __func__, __LINE__,
> gpu_read(ptdev, MCU_STATUS));
> +
> if (!gpu_read_poll_timeout(ptdev, MCU_STATUS, status,
> status == MCU_STATUS_HALT, 10,
> 100000)) {
> "
>
> In my case, the relevant output looks like this:
>
> "
> [ 3.326805] panthor_fw_pre_reset[1090] pre-halt status=1
> [ 3.432151] panthor_fw_pre_reset[1095] likely-halted status=2
> [ 3.542179] panthor_fw_pre_reset[1098] likely-running ? status=2
> "
>
> That means, the GPU remains halted at the end, even if the "GLB_HALT"
> bit is cleared before the last print. The clearing of GLB_HALT is also
> what panthor_fw_post_reset() does.
After the halt has been processed by the FW, the memory region where
you check the halt status again is inert, since the micro-controller
(MCU) supposed to update those bits is off at this point. The FW
interface is really just a shared memory region between the CPU and
MCU, nothing more.
>
> I suspect the extra soft reset I did before "un-halted" the GPU and
> allowed it to proceed.
Hm, not quite. I mean, you still need to explicitly boot the MCU after
a reset, which is what the write to MCU_CONTROL [1] does. What the
soft-reset does though, is reset all GPU blocks, including the MCU.
This means the MCU starts from a fresh state when you reach [1].
If I had to guess, I'd say something is messed up when the GPU is
halted, and you need a soft-reset to recover from that. Unfortunately,
I don't know enough about what your FW is doing to help. Maybe
Arm/Freescale could...
>
> I wonder if there is some way to un-halt the GPU using some gpu_write()
> direct register access, is there ?
That's MCU_CONTROL, yes. And it's done here [1] already.
> Maybe the GPU remains halted because
> setting the GLB_HALT stops command stream processing, and the GPU never
> samples the clearing of GLB_HALT and therefore remains halted forever ?
Exactly that, and that's expected.
Regards,
Boris
[1]https://elixir.bootlin.com/linux/v6.16.4/source/drivers/gpu/drm/panthor/panthor_fw.c#L1034
More information about the linux-arm-kernel
mailing list