[PATCH v3] rtc: ac100: Fix ac100 determine rate bug
Philipp Rossak
embed3d at gmail.com
Fri Feb 16 04:49:35 PST 2018
On 16.02.2018 05:10, Chen-Yu Tsai wrote:
> On Fri, Feb 16, 2018 at 1:53 AM, Philipp Rossak <embed3d at gmail.com> wrote:
>>
>>
>> On 15.02.2018 15:11, Maxime Ripard wrote:
>>>
>>> On Wed, Feb 14, 2018 at 02:56:12PM +0100, Philipp Rossak wrote:
>>>>
>>>> This patch fixes a bug, that prevents the Allwinner A83T and the A80
>>>> from a successful boot.
>>>>
>>>> The bug is there since v4.16-rc1 and appeared after the clk branch was
>>>> merged.
>>>
>>>
>>> Out of curiosity, which patch has introduced this? I couldn't find any
>>> obvious match.
>>>
>>
>> I wasn't also n
>
> To be honest, I'm not sure why this is hitting you and not me.
> I have both A83T boards that have assigned-clock-rates set for
> the ac100 clock outputs for WiFi. I have them running 4.16-rc1
> and have not seen this. The device tree patches that add these
> are in 4.15.
>
Now it is getting curious ... .
I already mentioned that bug in the sunxi-irc and someone else was
hitting that problem also...
I tested it also with the same toolchain you are using (gcc 7.3.0-1
Debian), but that didn't made any difference.
I don't think that issue is related with the Hardware, but to be on the
save side: Which Hardware version of the BPI-M3 do you have? I have
version 1.2.
Can someone else can confirm this bug?
>>
>>>> You can find the shortend trace below:
>>>>
>>>> Unable to handle kernel NULL pointer dereference at virtual address
>>>> 00000000
>>>> pgd = (ptrval)
>>>> [00000000] *pgd=00000000
>>>> Internal error: Oops: 5 [#1] SMP ARM
>>>> Modules linked in:
>>>> CPU: 0 PID: 49 Comm: kworker/0:1 Not tainted 4.15.0-10190-gb89e32ccd1be
>>>> #2
>>>> Hardware name: Allwinner sun8i Family
>>>> Workqueue: events deferred_probe_work_func
>>>> PC is at clk_hw_get_rate+0x0/0x34
>>>> LR is at ac100_clkout_determine_rate+0x48/0x19c
>>>>
>>>> [ ... ]
>>>>
>>>> (clk_hw_get_rate) from (ac100_clkout_determine_rate+0x48/0x19c)
>>>> (ac100_clkout_determine_rate) from (clk_core_set_rate_nolock+0x3c/0x1a0)
>>>> (clk_core_set_rate_nolock) from (clk_set_rate+0x30/0x88)
>>>> (clk_set_rate) from (of_clk_set_defaults+0x200/0x364)
>>>> (of_clk_set_defaults) from (platform_drv_probe+0x18/0xb0)
>>>>
>>>> To fix that bug, we first check if the return of the
>>>> clk_hw_get_parent_by_index is non zero. If it is zero we skip that
>>>> clock parent.
>>>>
>>>> The BUG report could be found here: https://lkml.org/lkml/2018/2/10/198
>>>>
>>>> Fixes: 04940631b8d2 ("rtc: ac100: Add clk output support")
>>>>
>>>> Signed-off-by: Philipp Rossak <embed3d at gmail.com>
>>>> ---
>>>>
>>>> Changes in v3:
>>>> * add information when the bug appeared
>>>> * make the comment more clear
>>>> Changes in v2:
>>>> * add tag Fixes: ... to commit message
>>>> * add comment to if statement why we are doing this check
>>>>
>>>> drivers/rtc/rtc-ac100.c | 19 ++++++++++++++++++-
>>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/rtc/rtc-ac100.c b/drivers/rtc/rtc-ac100.c
>>>> index 8ff9dc3fe5bf..2412aa2e8399 100644
>>>> --- a/drivers/rtc/rtc-ac100.c
>>>> +++ b/drivers/rtc/rtc-ac100.c
>>>> @@ -183,7 +183,24 @@ static int ac100_clkout_determine_rate(struct clk_hw
>>>> *hw,
>>>> for (i = 0; i < num_parents; i++) {
>>>> struct clk_hw *parent = clk_hw_get_parent_by_index(hw,
>>>> i);
>>>> - unsigned long tmp, prate = clk_hw_get_rate(parent);
>>>> + unsigned long tmp, prate;
>>>> +
>>>> + /*
>>>> + * The clock has two parents, one is a fixed clock which
>>>> is
>>>> + * internally registered by the ac100 driver. The other
>>>> parent
>>>> + * is a clock from the codec side of the chip, which we
>>>> + * properly declare and reference in the devicetree and
>>>> is
>>>> + * not implemented in any driver right now.
>>>> + * If the clock core looks for the parent of that second
>>>> + * missing clock, it can't one that is registered and
>>>> + * returns NULL.
>>>> + * Thus we need to check if the parent exists before
>>>> + * we get the parent rate.
>>>> + */
>>>> + if (!parent)
>>>> + continue;
>>>
>>>
>>> I'm sorry, but I still don't get it. When you register that clock, you
>>> will give it two parents. Why would that change during the life of the
>>> clock?
>>>
>>> This really looks like a workaround rather than an actual fix.
>>>
>>> Maxime
>>>
>> I agree this is more a workaround!
>> A proper solution/fix would be to define the devicetree correct like this:
>>
>> diff --git a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
>> b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
>> index 6550bf0e594b..6f56d429f17e 100644
>> --- a/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
>> +++ b/arch/arm/boot/dts/sun8i-a83t-bananapi-m3.dts
>> @@ -175,11 +175,18 @@
>> compatible = "x-powers,ac100-rtc";
>> interrupt-parent = <&r_intc>;
>> interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
>> - clocks = <&ac100_codec>;
>> + clocks = <&ac100_rtc_32k>;
>> #clock-cells = <1>;
>> clock-output-names = "cko1_rtc",
>> "cko2_rtc",
>> "cko3_rtc";
>> +
>> + ac100_rtc_32k: rtc-32k-oscillator {
>> + compatible = "fixed-clock";
>> + #clock-cells = <0>;
>> + clock-frequency = <32768>;
>> + clock-output-names = "ac100-rtc-32k";
>> + };
>> };
>> };
>> };
>>
>> What do you think about that solution?
>
> That's not quite right either. As I mentioned before, the
> RTC block has two clock inputs, one 4MHz signal from the
> codec block, and one 32.768 kHz signal from an external
> crystal. The original device tree binding describes the
> first one, and the 32.768 kHz clock was registered by
> the RTC driver internally.
>
> If you're going to add the crystal clock, you still need
> to keep the codec one. Note that this does not fix what
> Maxime is asking you. I've already provided an explanation:
>
> The clock core allows registering clocks with not-yet-existing
> clock parents. Parents are matches by string names. If no
> clock by that name is registered yet, the clock core simply
> orphans the new clock if the unregistered parent is its
> current parent or simply ignores that parent if its not the
> current parent. This is entirely valid and is what we are
> counting on here, as we haven't implemented the codec-side
> driver.
>
> Note we also sort of depend on this behavior for sunxi-ng
> clocks, where in some cases we get circular dependencies
> between clock blocks.
>
> BTW, since this is mostly related to the clk subsystem,
> please CC the clk maintainers as well.
>
Ok, I added also now the clk mailing list and maintainers.
As reference this here is the boot log [1].
Im not sure if this in relation to this bug:
But, already with the 4.15 kernel I get from time to time some oops
during runtime on the a31s (bpi-m2) and all are clock related and causes
a crash of the system.
Philipp
[1]: https://pastebin.com/5c7hxjsS
More information about the linux-arm-kernel
mailing list