[PATCH v2 00/18] i.MX8MM GPC improvements and BLK_CTRL driver

Frieder Schrempf frieder.schrempf at kontron.de
Wed Sep 1 03:03:39 PDT 2021


Hi Lucas,

On 09.08.21 13:50, Frieder Schrempf wrote:
> On 09.08.21 13:01, Lucas Stach wrote:
>> Hi Frieder,
>>
>> Am Donnerstag, dem 05.08.2021 um 20:56 +0200 schrieb Frieder Schrempf:
>>> On 05.08.21 12:18, Frieder Schrempf wrote:
>>>> On 21.07.21 22:46, Lucas Stach wrote:
>>>>> Hi all,
>>>>>
>>>>> second revision of the GPC improvements and BLK_CTRL driver to make use
>>>>> of all the power-domains on the i.MX8MM. I'm not going to repeat the full
>>>>> blurb from the v1 cover letter here, but if you are not familiar with
>>>>> i.MX8MM power domains, it may be worth a read.
>>>>>
>>>>> This 2nd revision fixes the DT bindings to be valid yaml, some small
>>>>> failure path issues and most importantly the interaction with system
>>>>> suspend/resume. With the previous version some of the power domains
>>>>> would not come up correctly after a suspend/resume cycle.
>>>>>
>>>>> Updated testing git trees here, disclaimer still applies:
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=raKaop3FUcsfKMyu13qCeyRKCgkObRuTAc73iQ4BYSI%3D&reserved=0
>>>>> https://eur04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.pengutronix.de%2Fcgit%2Flst%2Flinux%2Flog%2F%3Fh%3Dimx8m-power-domains-testing&data=04%7C01%7Cfrieder.schrempf%40kontron.de%7Cfc19fab094dd483e753708d95b2c3f0a%7C8c9d3c973fd941c8a2b1646f3942daf1%7C0%7C0%7C637641067865828503%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bmtM%2FxJ3Y9QpGkMhTDHLrLQ2AD0X7DqbspUMdkS%2B7MY%3D&reserved=0
>>>>
>>>> I finally did some tests on my side using USB, GPU and DSI (no PCIe, VPU, CSI so far) and the results are promising. Thanks for the effort!
>>>>
>>>> I will try to run some more automated suspend/resume and reboot test cycles over the weekend and report the results here afterwards.
>>>>
>>>
>>> Unfortunately I got some results sooner than I had hoped. I set up a simple loop to suspend/resume every few seconds and on the first run it took around 2-3 hours for the device to lock up on resume. On the second run it took less than half an hour. I had glmark2-es2-drm running in the background, but it looks like it crashed at some point before the lockup occurred.
>>>
>>> Of course this could also be unrelated and caused by some peripheral driver or something but the first suspicion is definitely the power domains.
>>>
>>> If you have any suggestions for which debug options to enable or where to add some printks, please let me know. If I do another run I would like to make sure that the resulting logs are helpful for debugging.
>>>
>>> And I would appreciate if someone else could try to reproduce this problem on his/her side. I use this simple script for testing:
>>>
>>> #!/bin/sh
>>>
>>> glmark2-es2-drm &
>>>
>>> while true;
>>> do
>>>     echo +10 > /sys/class/rtc/rtc0/wakealarm
>>>     echo mem > /sys/power/state
>>>     sleep 5
>>> done;
>>
>> Hm, that's unfortunate.
>>
>> I'm back from a two week vacation, but it looks like I won't have much
>> time available to look into this issue soon. It would be very helpful
>> if you could try to pinpoint the hang a bit more.  If you can reproduce
>> the hang with no_console_suspend you might be able to extract a bit
>> more info in which stage the hang happens (suspend, resume, TF-A, etc.)
>> If the hang is in the kernel you might be able to add some prints to
>> the suspend/resume paths to be able to track down the exact point of
>> the hang.
>>
>> I'm happy to look into the issue once it's better known where to look,
>> but I fear that I won't have time to do the above investigation myself
>> short term. Frieder, is this something you could help with over the
>> next few days?
> 
> I will see if I can find some time to track down the issue at least a little bit more. But I imagine it could get quite tedious if it takes up to several hours to reproduce the issue and I don't have much time to spare.
> 
> @Peng, @Adam and everyone else: Any chance you could setup a similar test and try to reproduce this?
> 
> On the other hand reboot cycle testing didn't show any lockup problems over more than 24 hours, so it seems like the issue is limited to resume.

I ran a few more suspend/resume cycles and watched the log. The first
2.5 hours nothing noteworthy happened, except that glmark2 crashed again
at some point.

Then suddenly the following lines were printed while suspending:

  imx-pgc imx-pgc-domain.6: failed to command PGC
  PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -110
  imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -110
  PM: Some devices failed to suspend, or early wake event detected

After that, the suspending continues to fail with the following on each try:

  PM: dpm_run_callback(): platform_pm_suspend+0x0/0x78 returns -22
  imx8m-blk-ctrl 38330000.blk-ctrl: PM: failed to suspend: error -22
  PM: Some devices failed to suspend, or early wake event detected

So far I didn't run into a lockup again with this test, but I will
continue trying to reproduce it and retrieve more information.

Best regards
Frieder



More information about the linux-arm-kernel mailing list