corrupt clk_core instance in clk_disable_unused_subtree (ARM Meson MX platform)

Martin Blumenstingl martin.blumenstingl at googlemail.com
Sat May 20 08:49:44 PDT 2017


On Thu, May 18, 2017 at 8:36 PM, Martin Blumenstingl
<martin.blumenstingl at googlemail.com> wrote:
> On Sun, May 14, 2017 at 11:20 PM, Martin Blumenstingl
> <martin.blumenstingl at googlemail.com> wrote:
>> Hello,
>>
>> it seems that I am seeing some strange memory corruption on one of my
>> Amlogic Meson MX (32-bit) devices.
>> disclaimer: I have some patches in my tree which are not mainlined yet
>> (see [0]), but cannot see that any of these patches would cause memory
>> corruption of a clk_core instance.
>>
>> Oleg (who is CC'ed) has first reported this when testing my kernel tree: [1]
>> in the meantime I have rebased all of my patches to Linus' mainline
>> tree, commit 0fcc3ab23d7395f58e8ab0834e7913e2e4314a83 [3]
>>
>> what I am seeing is a NULL deref in clk_disable_unused_subtree, full
>> log attached and can be found here: [3]
>> an explanation of what seems to be going on in my own words is:
>> - in line #5 of the log the internal PWM mux clock for the first PWM
>> channel is being registered (everything looks good with
>> clk_core=0xeddfbf80 and clk_hw=0xeddfbf30)
>> - the default parent of this mux is "xtal"
>> - in line #31 of the log the "disable unused clocks" cleanup starts
>> and checks the first child of the "xtal" clock and seems to find
>> clk_core=0xeddfbf80 *BUT* clk_hw=0x00000003
>> - this doesn't seem right and a crash is pretty obvious
>>
>> I also attached the patch which introduces this additional logspam -
>> just in case anyone wants to know what these values mean exactly.
>>
>> now the interesting part:
>> I can reproduce this with multi_v7_defconfig and
>> arch/arm/boot/dts/meson8m2-m8s.dts from my tree.
>> if I leave everything as it is and *only* enable CONFIG_DEBUG_SPINLOCK
>> then this crash goes away. so this *might* be a race-condition
>> somewhere...
> a user named "wilson2000" (since I missed you on IRC: thank you!)
> pointed out on IRC that there's a memory corruption bug in v4.11 and
> early v4.12 kernels which is fixed by [0] "perf/core: Avoid removing
> shared pmu_context on unregister"
> I have not tested this yet but this looks suspicious (so the common
> clock framework may be innocent). I will report back once I had time
> to test this.
I applied that patch and re-tested this: unfortunately it still
crashes with the same symptoms

so I am still interested in any kind of hint

>> has anybody seen this crash before? I can help debugging/testing
>> potential fixes/trying out various things to solve this - just let me
>> know!
>>
>>
>> Regards,
>> Martin
>>
>>
>> [0] https://github.com/xdarklight/linux/tree/meson-mx-integration-4.12-20170513
>> [1] http://lists.infradead.org/pipermail/linux-amlogic/2017-May/003497.html
>> [2] https://github.com/torvalds/linux/commit/0fcc3ab23d7395f58e8ab0834e7913e2e4314a83
>> [3] https://paste.kde.org/pbefvmqgr
>
>
> [0] https://cgit.freedesktop.org/drm/drm-intel/commit/?h=drm-intel-nightly&id=73ac44749e71333bce7d2f8c0bbdc1bbc57dae1b



More information about the linux-amlogic mailing list