next/master boot: 270 boots: 35 failed, 213 passed with 20 offline, 2 untried/unknown (next-20171207)
Marek Szyprowski
m.szyprowski at samsung.com
Mon Dec 11 02:43:19 PST 2017
Hi Shuah,
Do you have a bit of spare time for Exynos kernel development? Could you
investigate why Peach-Pi(t) Chromebooks fails to boot with recent
kernels? If I remember correctly, you had access to those boards.
The failure itself seems to be caused by the following patch:
https://patchwork.kernel.org/patch/10067711/ which got merged as
510353a63796 to v4.15-rc3 and fixed the boot issue on Snow Chromebook
(Exynos 5250 based).
However I don't see any path how it might deadlock and cause boot
failure on Exynos 5420/5800 Chromebooks. I don't have access to Peach
Chromebooks to reproduce and our Snow works fine.
Here are some logs:
v4.15-rc3 failure:
https://storage.kernelci.org/mainline/master/v4.15-rc3/arm/exynos_defconfig/lab-collabora/boot-exynos5800-peach-pi.html
next-20171207 first next failure:
https://storage.kernelci.org/next/master/next-20171207/arm/exynos_defconfig/lab-collabora/boot-exynos5800-peach-pi.html
Here is a report on the first boot failure in linux-next:
On 2017-12-11 10:28, Marek Szyprowski wrote:
> Hi Stephen,
>
> On 2017-12-08 17:59, Stephen Boyd wrote:
>> On 12/08, Marek Szyprowski wrote:
>>> On 2017-12-08 13:33, Krzysztof Kozlowski wrote:
>>>> On Fri, Dec 8, 2017 at 1:27 PM, Mark Brown <broonie at kernel.org> wrote:
>>>>> On Fri, Dec 08, 2017 at 12:20:07PM +0000, Mark Brown wrote:
>>>>>> On Thu, Dec 07, 2017 at 03:54:47PM -0800, kernelci.org bot wrote:
>>>>>>
>>>>>> Today's -next failed to boot on peach-pi:
>>>>>>
>>>>>>> exynos_defconfig:
>>>>>>> exynos5800-peach-pi:
>>>>>>> lab-collabora: new failure (last pass: next-20171205)
>>>>>> with details at
>>>>>> https://kernelci.org/boot/id/5a2a2e7859b5141bc2afa17c/
>>>>>> (including logs and comparisons with other boots, the last good
>>>>>> boot was
>>>>>> Wednesday). It looks like it hangs somewhere late on in boot,
>>>>>> the last
>>>>>> output on the console is:
>>>>>>
>>>>>> [ 4.827139] smsc95xx 3-1.1:1.0 eth0: register 'smsc95xx' at
>>>>>> usb-xhci-hcd.3.auto-1.1, smsc95xx USB 2.0 Ethernet,
>>>>>> 94:eb:2c:00:03:c0
>>>>>> [ 5.781037] dma-pl330 3880000.adma: Loaded driver for PL330
>>>>>> DMAC-241330
>>>>>> [ 5.786247] dma-pl330 3880000.adma: DBUFF-4x8bytes Num_Chans-6
>>>>>> Num_Peri-16 Num_Events-6
>>>>>> [ 5.819200] dma-pl330 3880000.adma: PM domain MAU will not be
>>>>>> powered off
>>>>>> [ 64.529228] random: crng init done
>>>>>>
>>>>>> and there's failures earlier to instantiate the display.
>>>>> I just noticed that further up the log there's a lockdep splat with a
>>>>> conflict between the genpd and clock API locking - an ABBA issue with
>>>>> genpd->mlock and the clock API prepare_lock.
>>>> +Cc Marek Szyprowski,
>>>>
>>>> The lockdep issue and display failures (including regulator warning)
>>>> were present for some time. They also appear in boot log for
>>>> next-20171206
>>>> (https://storage.kernelci.org/next/master/next-20171206/arm/exynos_defconfig/lab-collabora/boot-exynos5800-peach-pi.html).
>>>> The difference is that 20171208 hangs on "random: crng init done"
>>>> which did not appear before at all.
>> I haven't looked at the lockdep splat yet, but is that happening
>> because of runtime PM usage by the clk framework?
>
> This is a false positive. The deplock doesn't distinguish each domain
> instance.
> Only some instances of exynos power domains use clocks (as an old
> workaround of
> the lack possibility to integrate proper clock rate/topology
> restoration after
> power off/on cycle in the clock provider driver).
>
> Those clock controllers, which implements runtime pm, are assigned to
> power
> domain, which doesn't touch clocks at all.
>
> I still have no idea how to fix the code to make deplock happy.
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
More information about the linux-arm-kernel
mailing list