[RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks

Icenowy Zheng icenowy at aosc.io
Sun May 28 03:09:18 PDT 2023


在 2023-05-27星期六的 15:27 +0200,Frank Oltmanns写道:
> I would like to bring your attention to the current process of
> setting the rate
> of an NKM clock. As it stands, when setting the rate of an NKM clock,
> the rate
> nearest but less than or equal to the requested rate is found,
> instead of the
> nearest rate. Moreover, ccu_nkm_find_best() is called multiple times
> (footnote
> [1]) when setting a rate, each time iterating over all combinations
> of n, k, and
> m.
> 
> In response to this, I propose the following refinements to optimize
> the NKM
> clock setting:
>  a. when finding the best rate use the nearest rate, even if it is
> greater than
>     the requested rate (PATCH 1)
>  b. utilize binary search to find the best rate by going through a
>     precalculated, ordered list of all meaningful combinations of n,
> k, and m
>     (PATCH 2)
> 
> To illustrate, consider an NKM clock with min_n = 1, max_n = 16,
> min_k = 2,
> max_k = 4, min_m = 1, and max_m = 16. With the current approach, we
> have to go
> through 1024 combinations, of which only 275 are actually meaningful
> (the
> remaining 749 are combinations that result in the same frequency as
> other
> combinations). So, when selecting from these sorted 275 combinations
> we find the
> closest rate after 9 instead of 1024 steps.
> 
> As an example, I calculated the table off-line for the pll-mipi clock
> of
> sun50i-a64 (PATCH 3). However, I have identified two other potential
> strategies:
>  2. calculate before first use and persist for subsequent uses (i.e.
> never free,
>     see footnote [2])
>  3. calculate before first use and free after setting the rate.
> 
> Each approach carries its own merits. The one I implemented is the
> most
> efficient in terms of computation time but consumes the most memory.
> The second
> saves compute time after the initial use, while the third minimizes
> memory usage
> at the cost of additional computation.
> 
> The motivation for these proposed changes lies in the current
> behavior of rate
> selection for NKM clocks, which doesn't observe the
> CLK_SET_RATE_PARENT flag.
> I.e. it does not select a different rate for the parent clock to find
> the
> optimal rate. I believe this is likely due to the fact that selecting
> a new rate
> is quite compute intense, as it would involve iterating or
> calculating rates for
> the parent clock and then testing each rate with different n, k, and
> m
> combinations.
> 
> As an example, if the parent is an NM clock, we would have to work
> through the
> combinations of the parent's factors (the parent's n) and divisor
> (the parent's
> m). This results in five nested loops to evaluate all possible rates,
> an effort
> that escalates if the parent could also influence the grandparent's
> rate. In my
> example case (sun50i-a64) the pll-mipi's parent (pll-video0) has 2048
> combinations of n and m, of which 1266 are meaningful because the
> others result
> in the same frequency for pll-video0.
> 
> If we can come up with a way to iterate over the possible rates of a
> parent,
> this would eventually allow us to make NKM clocks obey the
> CLK_SET_RATE_PARENT
> flag. Because it would only require 11,349 (9 * 1,266) steps instead
> of
> 2,097,152 (1,024 * 2,048).
> 
> Things I considered but don't have a clear answer to:
>  - Is there a specific reason, why currently only clock rates less
> than the
>    requested rate are considered when setting a new rate?

Well it's for preventing overrunning any hardware, either the SoC
itself or some other peripherals (e.g. the LCD panel will receive some
clock from the SoC).

>  - Do you think it is worth the memory and increased complexity to be
> able to
>    change the parent's clock rate?

Well, at least on Allwinner A64, changing multiple clocks may be needed
because its second PLL-VIDEO is quirky and the dotclocks of the two
outputs may need to be generated from only the same PLL-VIDEO0
(although LCD pipeline dotclock could be generated from PLL-MIPI, PLL-
MIPI's one possible input is PLL-VIDEO0).

> 
> I look forward to hearing your thoughts on these proposed changes.
> Thank you for
> your time and consideration.
> 
> Footnotes:
> [1] Multiple times because ccu_nkm_find_best is (indirectly) called
> from clk.c
> in
>  - clk_core_req_round_rate_nolock()
>  - clk_calc_new_rates() (which in turn is called three times for
> reasons that
>    currently elude me)
>  - clk_change_rate
> 
> [2] Actually, we could free the memory in a new ccu_nkm_terminate()
> function,
> which could become part of ccu_nkm_ops. But if my code searching
> skills don't
> betray me, there is currently no other clock that implements the
> terminate
> function.
> 
> Frank Oltmanns (3):
>   clk: sunxi-ng: nkm: Minimize difference when finding rate
>   clk: sunxi-ng: Implement precalculated NKM rate selection
>   clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-
> mipi
> 
>  drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 281
> ++++++++++++++++++++++++++
>  drivers/clk/sunxi-ng/ccu_mux.c        |   2 +-
>  drivers/clk/sunxi-ng/ccu_nkm.c        |  97 +++++++--
>  drivers/clk/sunxi-ng/ccu_nkm.h        |  26 +++
>  4 files changed, 385 insertions(+), 21 deletions(-)
> 




More information about the linux-arm-kernel mailing list