brocken devfreq simple_ondemand for Odroid XU3/4?

Willy Wolff willy.mh.wolff.ml at gmail.com
Wed Jun 24 04:01:17 EDT 2020


Hi Krzysztof,
Thanks to look at it.

mem_gov is /sys/class/devfreq/10c20000.memory-controller/governor

Here some numbers after increasing the running time:

Running using simple_ondemand:
Before:
     From  :   To                                                                                     
           : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
* 165000000:         0         0         0         0         0         0         0         4   4528600
  206000000:         5         0         0         0         0         0         0         0     57780
  275000000:         0         5         0         0         0         0         0         0     50060
  413000000:         0         0         5         0         0         0         0         0     46240
  543000000:         0         0         0         5         0         0         0         0     48970
  633000000:         0         0         0         0         5         0         0         0     47330
  728000000:         0         0         0         0         0         0         0         0         0
  825000000:         0         0         0         0         0         5         0         0    331300
Total transition : 34


After:
     From  :   To
           : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
* 165000000:         0         0         0         0         0         0         0         4   5098890
  206000000:         5         0         0         0         0         0         0         0     57780
  275000000:         0         5         0         0         0         0         0         0     50060
  413000000:         0         0         5         0         0         0         0         0     46240
  543000000:         0         0         0         5         0         0         0         0     48970
  633000000:         0         0         0         0         5         0         0         0     47330
  728000000:         0         0         0         0         0         0         0         0         0
  825000000:         0         0         0         0         0         5         0         0    331300
Total transition : 34

With a running time of:
LITTLE => 283.699 s (680.877 c per mem access)
big => 284.47 s (975.327 c per mem access)


And when I set to the performance governor:
Before:
     From  :   To
           : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
  165000000:         0         0         0         0         0         0         0         5   5099040
  206000000:         5         0         0         0         0         0         0         0     57780
  275000000:         0         5         0         0         0         0         0         0     50060
  413000000:         0         0         5         0         0         0         0         0     46240
  543000000:         0         0         0         5         0         0         0         0     48970
  633000000:         0         0         0         0         5         0         0         0     47330
  728000000:         0         0         0         0         0         0         0         0         0
* 825000000:         0         0         0         0         0         5         0         0    331350
Total transition : 35

After:
     From  :   To
           : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
  165000000:         0         0         0         0         0         0         0         5   5099040
  206000000:         5         0         0         0         0         0         0         0     57780
  275000000:         0         5         0         0         0         0         0         0     50060
  413000000:         0         0         5         0         0         0         0         0     46240
  543000000:         0         0         0         5         0         0         0         0     48970
  633000000:         0         0         0         0         5         0         0         0     47330
  728000000:         0         0         0         0         0         0         0         0         0
* 825000000:         0         0         0         0         0         5         0         0    472980
Total transition : 35

With a running time of:
LITTLE: 68.8428 s (165.223 c per mem access)
big: 71.3268 s (244.549 c per mem access)


I see some transition, but not occuring during the benchmark.
I haven't dive into the code, but maybe it is the heuristic behind that is not
well defined? If you know how it's working that would be helpfull before I dive
in it.

I run your test as well, and indeed, it seems to work for large bunch of memory,
and there is some delay before making a transition (seems to be around 10s).
When you kill memtester, it reduces the freq stepwisely every ~10s.

Note that the timing shown above account for the critical path, and the code is
looping on reading only, there is no write in the critical path.
Maybe memtester is doing writes and devfreq heuristic uses only write info?

Cheers,
Willy

On 2020-06-23-21-11-29, Krzysztof Kozlowski wrote:
> On Tue, Jun 23, 2020 at 09:02:38PM +0200, Krzysztof Kozlowski wrote:
> > On Tue, 23 Jun 2020 at 18:47, Willy Wolff <willy.mh.wolff.ml at gmail.com> wrote:
> > >
> > > Hi everybody,
> > >
> > > Is DVFS for memory bus really working on Odroid XU3/4 board?
> > > Using a simple microbenchmark that is doing only memory accesses, memory DVFS
> > > seems to not working properly:
> > >
> > > The microbenchmark is doing pointer chasing by following index in an array.
> > > Indices in the array are set to follow a random pattern (cutting prefetcher),
> > > and forcing RAM access.
> > >
> > > git clone https://github.com/wwilly/benchmark.git \
> > >   && cd benchmark \
> > >   && source env.sh \
> > >   && ./bench_build.sh \
> > >   && bash source/scripts/test_dvfs_mem.sh
> > >
> > > Python 3, cmake and sudo rights are required.
> > >
> > > Results:
> > > DVFS CPU with performance governor
> > > mem_gov = simple_ondemand at 165000000 Hz in idle, should be bumped when the
> > > benchmark is running.
> > > - on the LITTLE cluster it takes 4.74308 s to run (683.004 c per memory access),
> > > - on the big cluster it takes 4.76556 s to run (980.343 c per moemory access).
> > >
> > > While forcing DVFS memory bus to use performance governor,
> > > mem_gov = performance at 825000000 Hz in idle,
> > > - on the LITTLE cluster it takes 1.1451 s to run (164.894 c per memory access),
> > > - on the big cluster it takes 1.18448 s to run (243.664 c per memory access).
> > >
> > > The kernel used is the last 5.7.5 stable with default exynos_defconfig.
> > 
> > Thanks for the report. Few thoughts:
> > 1. What trans_stat are saying? Except DMC driver you can also check
> > all other devfreq devices (e.g. wcore) - maybe the devfreq events
> > (nocp) are not properly assigned?
> > 2. Try running the measurement for ~1 minutes or longer. The counters
> > might have some delay (which would require probably fixing but the
> > point is to narrow the problem).
> > 3. What do you understand by "mem_gov"? Which device is it?
> 
> +Cc Lukasz who was working on this.
> 
> I just run memtester and more-or-less ondemand works (at least ramps
> up):
> 
> Before:
> /sys/class/devfreq/10c20000.memory-controller$ cat trans_stat
>      From  :   To
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
> * 165000000:         0         0         0         0         0         0         0         0   1795950
>   206000000:         1         0         0         0         0         0         0         0      4770
>   275000000:         0         1         0         0         0         0         0         0     15540
>   413000000:         0         0         1         0         0         0         0         0     20780
>   543000000:         0         0         0         1         0         0         0         1     10760
>   633000000:         0         0         0         0         2         0         0         0     10310
>   728000000:         0         0         0         0         0         0         0         0         0
>   825000000:         0         0         0         0         0         2         0         0     25920
> Total transition : 9
> 
> 
> $ sudo memtester 1G
> 
> During memtester:
> /sys/class/devfreq/10c20000.memory-controller$ cat trans_stat
>      From  :   To
>            : 165000000 206000000 275000000 413000000 543000000 633000000 728000000 825000000   time(ms)
>   165000000:         0         0         0         0         0         0         0         1   1801490
>   206000000:         1         0         0         0         0         0         0         0      4770
>   275000000:         0         1         0         0         0         0         0         0     15540
>   413000000:         0         0         1         0         0         0         0         0     20780
>   543000000:         0         0         0         1         0         0         0         2     11090
>   633000000:         0         0         0         0         3         0         0         0     17210
>   728000000:         0         0         0         0         0         0         0         0         0
> * 825000000:         0         0         0         0         0         3         0         0    169020
> Total transition : 13
> 
> However after killing memtester it stays at 633 MHz for very long time
> and does not slow down. This is indeed weird...
> 
> Best regards,
> Krzysztof



More information about the linux-arm-kernel mailing list