next/master boot: 273 boots: 63 failed, 209 passed with 1 untried/unknown (next-20171106)
Robin Murphy
robin.murphy at arm.com
Wed Nov 8 07:55:38 PST 2017
On 08/11/17 15:19, Guillaume Tucker wrote:
> On 07/11/17 11:43, Guillaume Tucker wrote:
>> On 07/11/17 10:55, Mark Brown wrote:
>>> On Tue, Nov 07, 2017 at 10:12:59AM +0000, Jon Hunter wrote:
>>>> On 06/11/17 19:17, Mark Brown wrote:
>>>
>>>>>> multi_v7_defconfig:
>>>>>> tegra124-nyan-big:
>>>>>> lab-collabora: failing since 2 days (last pass:
>>>>>> next-20171102 - first fail: next-20171103)
>>>
>>>> Thanks for the report. I have been looking into a failure on nyan-big
>>>> [0], but this one looks like a new failure. I will take a look.
>>>
>>> Guillaume Tucker has been bisecting this with the shiny new bisection
>>> code he's testing, he was saying on IRC he thinks he's found the
>>> offending commit:
>>>
>>>
>>> https://people.collabora.com/~gtucker/tmp/bisect-tegra-4.14.rc8-next-20171106.txt
>>>
>>>
>>> (not CCing Johannes yet)
>>
>> Please take this with a pinch of salt, I'm now running some extra
>> boot tests to prove it. If you look at this log, all the boots
>> passed which is a bit suspicious. I did build and boot the
>> revision it found with multi_v7_defconfig on tegra124 and it
>> passed, so it looks like this commit may not have anything to do
>> with the boot failure. The automated bisection is still experimental.
>>
>> Passing LAVA boot test with this revision:
>>
>> https://lava.collabora.co.uk/scheduler/job/976375
>>
>> I've started a slightly different bisection job now on
>> next-20171107 and the common ancestor between next and mainline,
>> results can take a few hours to come back.
>
> After a few more automated bisection attempts and a bug fix in
> LAVA, I've now found at least one potentially breaking commit:
>
> commit d89e2378a97fafdc74cbf997e7c88af75b81610a
> Author: Robin Murphy <robin.murphy at arm.com>
> Date: Thu Oct 12 16:56:14 2017 +0100
>
> drivers: flag buses which demand DMA configuration
>
>
> I've run some boot tests manually with this revision and then
> also after reverting it in-place, these respectively failed and
> passed:
>
> * d89e2378, failed:
> https://lava.collabora.co.uk/scheduler/job/978968
>
> * d89e2378 reverted, passed:
> https://lava.collabora.co.uk/scheduler/job/978969
>
>
> I then went on and tried the same but on top of next-20171108 and
> found that they both failed
>
> * next-20171108, failed:
> https://lava.collabora.co.uk/scheduler/job/979063
>
> * next-20171108 with d89e2378 reverted, failed as well:
> https://lava.collabora.co.uk/scheduler/job/979167
>
>
> So this shows there is almost certainly another offending commit
> in -next. The errors in both cases are not quite the same, the
> last one is triggered by a BUG whereas the first one is a NULL
> pointer (I haven't looked any further). Also I don't think
> there's any fix for d89e2378a97fafdc74cbf997e7c88af75b81610a
> which is currently still in next.
The fix was actually posted before said commit was even written:
https://patchwork.kernel.org/patch/9967847/
What is currently queued in the DMA tree fell out of the discussion on
patch 2 of that series, but I kind of assumed the host1x folks would
still take patch 1; I guess that hasn't happened.
Robin.
>
> Note: This happens to be a very good example of running a
> kernelci.org bisection on a real issue, it's quite a bit of a
> pipe cleaner. I'll now see if there's a way to bisect what looks
> like another breaking change in-between.
>
> Guillaume
More information about the linux-arm-kernel
mailing list