[PATCH v11] drm: Add initial ci/ subdirectory

Wed Aug 30 08:14:04 PDT 2023

On 30/08/2023 11:57, Maxime Ripard wrote:
> On Wed, Aug 30, 2023 at 10:24:49AM -0300, Helen Koike wrote:
>> Hi all,
>>
>> Thanks for you comments.
>>
>> On 30/08/2023 08:37, Maxime Ripard wrote:
>>> On Wed, Aug 30, 2023 at 01:58:31PM +0300, Jani Nikula wrote:
>>>> On Wed, 30 Aug 2023, Maxime Ripard <mripard at kernel.org> wrote:
>>>>> On Tue, Aug 22, 2023 at 04:26:06PM +0200, Daniel Vetter wrote:
>>>>>> On Fri, Aug 11, 2023 at 02:19:53PM -0300, Helen Koike wrote:
>>>>>>> From: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>>>>
>>>>>>> Developers can easily execute several tests on different devices
>>>>>>> by just pushing their branch to their fork in a repository hosted
>>>>>>> on gitlab.freedesktop.org which has an infrastructure to run jobs
>>>>>>> in several runners and farms with different devices.
>>>>>>>
>>>>>>> There are also other automated tools that uprev dependencies,
>>>>>>> monitor the infra, and so on that are already used by the Mesa
>>>>>>> project, and we can reuse them too.
>>>>>>>
>>>>>>> Also, store expectations about what the DRM drivers are supposed
>>>>>>> to pass in the IGT test suite. By storing the test expectations
>>>>>>> along with the code, we can make sure both stay in sync with each
>>>>>>> other so we can know when a code change breaks those expectations.
>>>>>>>
>>>>>>> Also, include a configuration file that points to the out-of-tree
>>>>>>> CI scripts.
>>>>>>>
>>>>>>> This will allow all contributors to drm to reuse the infrastructure
>>>>>>> already in gitlab.freedesktop.org to test the driver on several
>>>>>>> generations of the hardware.
>>>>>>>
>>>>>>> Signed-off-by: Tomeu Vizoso <tomeu.vizoso at collabora.com>
>>>>>>> Signed-off-by: Helen Koike <helen.koike at collabora.com>
>>>>>>> Acked-by: Daniel Stone <daniels at collabora.com>
>>>>>>> Acked-by: Rob Clark <robdclark at gmail.com>
>>>>>>> Tested-by: Rob Clark <robdclark at gmail.com>
>>>>>>
>>>>>> Ok I pushed this into a topic/drm-ci branch in drm.git and asked sfr to
>>>>>> include that branch in linux-next.
>>>>>>
>>>>>> But also I'd like to see a lot more acks here, we should be able to at
>>>>>> least pile up a bunch of (driver) maintainers from drm-misc in support of
>>>>>> this. Also maybe media, at least I've heard noises that they're maybe
>>>>>> interested too? Plus anyone else, the more the better.
>>>>>
>>>>> I'm not really convinced by that approach at all, and most of the issues
>>>>> I see are shown by the follow-up series here:
>>>>
>>>> I'm not fully convinced either, more like "let's see". In that narrow
>>>> sense, ack. I don't see harm in trying, if you're also open to backing
>>>> off in case it does not pan out.
>>>>
>>>>> https://lore.kernel.org/dri-devel/20230825122435.316272-1-vignesh.raman@collabora.com/
>>>>>
>>>>>     * We hardcode a CI farm setup into the kernel
>>
>>
>> These could be out of tree.
>>
>> There is a version outside the kernel tree where you just point the CI
>> configuration to a url:
>> https://gitlab.freedesktop.org/gfx-ci/drm-ci/-/merge_requests/1
>>
>> We were discussing it here https://www.linuxtv.org/cgi-bin/mailman/private/linuxtv-ci/2023-August/000027.html
> 
> It looks like it's private
> 
>> (I guess Sima's reply didn't got into the mailing list) but the argument of
>> not having out of tree repo is due to historical bad experience of having to
>> sync the kernel with the code and it can become messy.
> 
> My point is that even though the test strategy might be considered a
> "property" of the kernel, how you execute it is definitely not and you
> will have as many setups as you have CI farms. You can't put that into
> the kernel, just like we don't put the kernel command line in it for
> example. >
>>>>>
>>>>>     * We cannot trust that the code being run is actually the one being
>>>>>       pushed into gitlab
>>
>>
>> We can improve this if this is a requirement.
>> For DTS configuration we can work with overlays (which is the current
>> modification on that patchset). For other changes that are not suitable to
>> upstream (and should be rare) we can see if we work with the
>> `-external-fixes` approach or another approach, we can check it case by case
>> to understand why it is not suitable for upstream.
> 
> The existence of that branch in itself is an issue to me. Again, it's a
> matter of trust. How can I trust a branch I barely know about, of which
> the development is not clear and isn't reviewed by any of the
> maintainers of the code that might affect the test outcomes.
> 
> Or put another way, if I run the tests on my machine, it won't work. Why
> should it work on the CI farm? The branch itself is broken. It might not
> be due to any of the work I did, but it's broken still.
> 
>>>>>
>>>>>     * IMO, and I know we disagree here, any IGT test we enable for a given
>>>>>       platform should work, period. Allowing failures and flaky tests just
>>>>>       sweeps whatever issue is there under the rug. If the test is at
>>>>>       fault, we should fix the test, if the driver / kernel is at fault,
>>>>>       then I certainly want to know about it.
>>
>> I believe we need a baseline and understand the current status of tests. If
>> you check the xfails folder in the patch you can see that I had to add a few
>> tests on *-skips.txt since those tests crashes the system and other on
>> *-fails.txt that are consistently not passing.
> 
> I agree that we need a baseline, but that baseline should be defined by
> the tests own merits, not their outcome on a particular platform.
> 
> In other words, I want all drivers to follow that baseline, and if they
> don't it's a bug we should fix, and we should be vocal about it. We
> shouldn't ignore the test because it's broken.
> 
> Going back to the example I used previously, kms_hdmi_inject at inject-4k
> shouldn't fail on mt8173, ever. That's a bug. Ignoring it and reporting
> that "all tests are good" isn't ok. There's something wrong with that
> driver and we should fix it.
> 
> Or at the very least, explain in much details what is the breakage, how
> we noticed it, why we can't fix it, and how to reproduce it.
> 
> Because in its current state, there's no chance we'll ever go over that
> test list and remove some of them. Or even know if, if we ever fix a bug
> somewhere, we should remove a flaky or failing test.
> 
>> Since the "any IGT test we enable for a given platform should work" is not a
>> reality atm,
> 
> Thanks for the reality check, but it's very much doable: we're in
> control of the test suite.
> 
>> we need to have a clear view about which tests are not corresponding
>> to it, so we can start fixing. First we need to be aware of the issues
>> so we can start fixing them, otherwise we will stay in the "no tests
>> no failures" ground :)
> 
> I think we have somewhat contradicting goals. You want to make
> regression testing, so whatever test used to work in the past should
> keep working. That's fine, but it's different from "expectations about
> what the DRM drivers are supposed to pass in the IGT test suite" which
> is about validation, ie "all KMS drivers must behave this way".

I see. Indeed, for me it is more about regression testing.

We could have a configuration where developers could choose to run 
regression tests or overall validation (but I understand this is not the 
point, but just saying we could have both somehow).

We could have some policy: if you want to enable a certain device in the 
CI, you need to make sure it passes all tests first to force people to 
go fix the issues, but maybe it would be a big barrier.

I'm afraid that, if a test fail (and it is a clear bug), people would 
just say "work for most of the cases, this is not a priority to fix" and 
just start ignoring the CI, this is why I think regression tests is a 
good way to start with.

Anyway, just brain storming :)
I really hope we can find a nice useful solution for the community.

Regards,
Helen

> 
> I guess for regression you very much would like that all-green
> dashboard, and it's understandable. For validation, we don't care and we
> should be as vocal as possible to report broken drivers.
> 
> Eventually, we should have regression testing over the validation test
> suite.
> 
> It's not about reality. We should be clear what we expect from those
> test suites, and not claim that it's something it's not.
> 
>>>> At least for display, where this also depends on peripheral hardware,
>>>> it's not an easy problem, really.
>>>
>>> Aside from the Chamelium tests, which tests actually rely on peripheral
>>> hardware? On EDID and hotplug, sure, but that can easily be set up from
>>> the userspace, or something like
>>>
>>> https://www.lindy-international.com/HDMI-2-0-EDID-Emulator.htm?websale8=ld0101.ld021102&pi=32115
>>>
>>>> How reliable do you need it to be? How many nines? Who is going to
>>>> debug the issues that need hundreds or thousands of runs to reproduce?
>>>> If a commit makes some test less reliable, how long is it going to
>>>> take to even see that or pinpoint that?
>>>
>>> I mean, that's also true for failures or success then. How many times do
>>> you need a test to run properly to qualify it as a meaningful test? How
>>> do you know that it's not a flaky test?
>>>
>>> Ultimately, it's about trust. If, for a given test that just failed, I
>>> can't be certain that it's because of the branch I just submitted, I
>>> will just ignore the tests results after a while.
>>>
>>> This is already what plagues kernelci, and we should do better.
>>
>> This is something that is really nice on Mesa3D, a patch only gets merged if
>> tests passes, which forces people to not ignore it, which forces the code to
>> be fixed and the CI to be constantly maintained.
>>
>> Of course there are bad days there, but there is real value. Nice thread to
>> check: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8635 (thanks Alyssa
>> for the feedback).
> 
> I'm sure it works great for Mesa, but I'm also sure it doesn't ignore
> CTS reports that a particular device isn't a valid OpenGL or Vulkan
> implementation anymore.
> 
>>> And I'm sorry, but if some part of the kernel or driver just isn't
>>> reliable, then we shouldn't claim it is (except for all the times it
>>> isn't). If no-one has the time to look into it, fine, but flagging it
>>> under a flaky test doesn't help anyone.
>>
>> At least we would know what is there that isn't reliable.
> 
> We would too if the test was reported as failed. But our preferred
> approach to do so diverge.
> 
> Maxime