renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin

Brian Norris briannorris at chromium.org
Wed Dec 21 14:02:43 PST 2022


Hi Mark, Sean, (and dri-devel)

On Wed, Dec 14, 2022 at 07:04:37PM -0800, Brian Norris wrote:
> On Tue, Dec 13, 2022 at 04:51:11PM +0000, Mark Brown wrote:
> > On Tue, Dec 13, 2022 at 05:56:30AM -0800, KernelCI bot wrote:
> > 
> > The KernelCI bisection bot found regressions in at least two KMS tests
> > in the Renesas tree on rk3399-gru-kevin just after the Renesas tree
> > merged up mainline:
> > 
> >    igt-kms-rockchip.kms_vblank.pipe-A-wait-forked 
> >    igt-kms-rockchip.kms_vblank.pipe-A-query-busy
> > 
> > which it bisected to ca871659ec16 ("drm/bridge: analogix_dp: Support
> > PSR-exit to disable transition").  I'm not *100%* sure I trust the
> > bisection but it sure is suspicous that two separate bisects for related
> > issues landed on the same commit.

...

> > | IGT-Version: 1.26-gf8a4a0b (aarch64) (Linux: 6.1.0 aarch64)
> > | <14>[   35.444448] [IGT] drm_read: starting subtest short-buffer-wakeup
> > | Starting subtest: short-buffer-wakeup
> > | 
> > | (| drm_read:350) CRITICAL: Test assertion failure function generate_event, file ../tests/drm_read.c:65:
> > | (drm_read:350) CRITICAL: <14>[   36.155642] [IGT] drm_read: exiting, ret=98
> > | Failed assertion: kmstest_get_vblank(fd, pipe, DRM_VBLANK_EVENT)
> > | 
> > | (drm_read:350) CRITICAL: Last errno: 22, Invalid argument
> > | Stack trace:
> > | 
> > |   #0 ../lib/igt_core.c:1933 __igt_fail_assert()
> > |   #1 [<unknown>+0xd5362770]
> > |   #2 [<unknown>+0xd536193c]
> > |   #3 [__libc_start_main+0xe8]
> > |   #4 [<unknown>+0xd5361974]
> > |   #5 [<unknown<6>[   36.162851] Console: switching to colour frame buffer device 300x100>+0xd5361974]
> > | Subtest short-buffer-wakeup failed.

...

> I'll look some more, but this might be a difference of test setup, such
> that my setup has the issue before and after that commit, but your setup
> sees a regression.

I believe this is the case; things are broken both before and after, but
depending on test sequencing, etc., they might have seemed less broken
before.

When we're failing, we're getting EINVAL from drm_vblank_get(). That
essentially means that vblank has been disabled (drm_crtc_vblank_off()).
As it happens, this is exactly what we do when we enter PSR (Panel Self
Refresh).

Now, these test cases (especially 'kms_vblank.pipe-A-wait-forked') seem
to display a static image, and then wait for a bunch of vblank events.
This is exactly the sort of case where we should enter PSR, and so we're
likely to disable vblank events. Thus, if everything is working right,
we will often hit EINVAL in ioctl(DRM_IOCTL_WAIT_VBLANK) ... ->
drm_vblank_get(), and fail the test.

As to why this appears to be a regression: like mentioned in my previous
mail, these tests tend to hose the Analogix PSR state before my patch
set. When PSR is hosed, we tend to stay with PSR disabled, and so
drm_vblank_get() doesn't return EINVAL, and the test is happy. (Never
mind that the display is likely non-functional.) [0]

So how to fix this?

1. ignore it; it's "just" a test failure, IIUC [1]
2. change the test, to skip this test if the device has PSR
3. leave vblank enabled even in the presence of PSR
4. otherwise modify the vblank ioctl interface, such that one can "wait"
   for a vblank that won't come (because the display is in self-refresh
   / there are no new frames or vblanks)

I don't know how to do #2, because this variant of PSR is intentionally
opaque to user space.

For #3: the downstream PSR support that these systems shipped with
initially did not disable vblank in PSR. But that was massively
rewritten/refactored by Sean Paul before it made it upstream, and now it
does. I tried briefly to factor that part out
(drivers/gpu/drm/rockchip/rockchip_drm_vop.c /
drm_crtc_vblank_{on,off}()), but because drm_self_refresh_helper.c
(ab?)uses the commit step to "disable" the CRTC for PSR (even though the
CRTC is not fully disabled), I tend to hit this in
drm_atomic_helper_commit_modeset_disables()->disable_outputs():

		ret = drm_crtc_vblank_get(crtc);
		WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n");

And I hit a few more problems too...

...I'm sure I could hack my way through this somehow, but this is all
sounding like it could use some advice from someone more familiar with
DRM and/or the drm_self_refresh_helper design. I've learned my way
around this a bit by necessity, but I still feel like I don't know all
the right answers here. (*cough*, Sean?)

Brian

[0] I definitely reproduce this part locally, before my patches. I can't
find non-expired CI logs for "passing" runs to be sure that's what the
CI is seeing though.

[1] Or is it? I don't really know the DRM_IOCTL_WAIT_VBLANK ABI
    contract in the presence of PSR, but I believe the upstream PSR
    support has always worked this way. I could imagine
    DRM_IOCTL_WAIT_VBLANK users might not love seeing EINVAL here
    though.



More information about the linux-arm-kernel mailing list