[RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
Icenowy Zheng
icenowy at aosc.io
Sun May 28 03:09:18 PDT 2023
在 2023-05-27星期六的 15:27 +0200,Frank Oltmanns写道:
> I would like to bring your attention to the current process of
> setting the rate
> of an NKM clock. As it stands, when setting the rate of an NKM clock,
> the rate
> nearest but less than or equal to the requested rate is found,
> instead of the
> nearest rate. Moreover, ccu_nkm_find_best() is called multiple times
> (footnote
> [1]) when setting a rate, each time iterating over all combinations
> of n, k, and
> m.
>
> In response to this, I propose the following refinements to optimize
> the NKM
> clock setting:
> a. when finding the best rate use the nearest rate, even if it is
> greater than
> the requested rate (PATCH 1)
> b. utilize binary search to find the best rate by going through a
> precalculated, ordered list of all meaningful combinations of n,
> k, and m
> (PATCH 2)
>
> To illustrate, consider an NKM clock with min_n = 1, max_n = 16,
> min_k = 2,
> max_k = 4, min_m = 1, and max_m = 16. With the current approach, we
> have to go
> through 1024 combinations, of which only 275 are actually meaningful
> (the
> remaining 749 are combinations that result in the same frequency as
> other
> combinations). So, when selecting from these sorted 275 combinations
> we find the
> closest rate after 9 instead of 1024 steps.
>
> As an example, I calculated the table off-line for the pll-mipi clock
> of
> sun50i-a64 (PATCH 3). However, I have identified two other potential
> strategies:
> 2. calculate before first use and persist for subsequent uses (i.e.
> never free,
> see footnote [2])
> 3. calculate before first use and free after setting the rate.
>
> Each approach carries its own merits. The one I implemented is the
> most
> efficient in terms of computation time but consumes the most memory.
> The second
> saves compute time after the initial use, while the third minimizes
> memory usage
> at the cost of additional computation.
>
> The motivation for these proposed changes lies in the current
> behavior of rate
> selection for NKM clocks, which doesn't observe the
> CLK_SET_RATE_PARENT flag.
> I.e. it does not select a different rate for the parent clock to find
> the
> optimal rate. I believe this is likely due to the fact that selecting
> a new rate
> is quite compute intense, as it would involve iterating or
> calculating rates for
> the parent clock and then testing each rate with different n, k, and
> m
> combinations.
>
> As an example, if the parent is an NM clock, we would have to work
> through the
> combinations of the parent's factors (the parent's n) and divisor
> (the parent's
> m). This results in five nested loops to evaluate all possible rates,
> an effort
> that escalates if the parent could also influence the grandparent's
> rate. In my
> example case (sun50i-a64) the pll-mipi's parent (pll-video0) has 2048
> combinations of n and m, of which 1266 are meaningful because the
> others result
> in the same frequency for pll-video0.
>
> If we can come up with a way to iterate over the possible rates of a
> parent,
> this would eventually allow us to make NKM clocks obey the
> CLK_SET_RATE_PARENT
> flag. Because it would only require 11,349 (9 * 1,266) steps instead
> of
> 2,097,152 (1,024 * 2,048).
>
> Things I considered but don't have a clear answer to:
> - Is there a specific reason, why currently only clock rates less
> than the
> requested rate are considered when setting a new rate?
Well it's for preventing overrunning any hardware, either the SoC
itself or some other peripherals (e.g. the LCD panel will receive some
clock from the SoC).
> - Do you think it is worth the memory and increased complexity to be
> able to
> change the parent's clock rate?
Well, at least on Allwinner A64, changing multiple clocks may be needed
because its second PLL-VIDEO is quirky and the dotclocks of the two
outputs may need to be generated from only the same PLL-VIDEO0
(although LCD pipeline dotclock could be generated from PLL-MIPI, PLL-
MIPI's one possible input is PLL-VIDEO0).
>
> I look forward to hearing your thoughts on these proposed changes.
> Thank you for
> your time and consideration.
>
> Footnotes:
> [1] Multiple times because ccu_nkm_find_best is (indirectly) called
> from clk.c
> in
> - clk_core_req_round_rate_nolock()
> - clk_calc_new_rates() (which in turn is called three times for
> reasons that
> currently elude me)
> - clk_change_rate
>
> [2] Actually, we could free the memory in a new ccu_nkm_terminate()
> function,
> which could become part of ccu_nkm_ops. But if my code searching
> skills don't
> betray me, there is currently no other clock that implements the
> terminate
> function.
>
> Frank Oltmanns (3):
> clk: sunxi-ng: nkm: Minimize difference when finding rate
> clk: sunxi-ng: Implement precalculated NKM rate selection
> clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-
> mipi
>
> drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 281
> ++++++++++++++++++++++++++
> drivers/clk/sunxi-ng/ccu_mux.c | 2 +-
> drivers/clk/sunxi-ng/ccu_nkm.c | 97 +++++++--
> drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++
> 4 files changed, 385 insertions(+), 21 deletions(-)
>
More information about the linux-arm-kernel
mailing list