[PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Mark Rutland
mark.rutland at arm.com
Fri Jul 10 05:35:16 EDT 2020
On Fri, Jul 10, 2020 at 04:03:39PM +0800, Pingfan Liu wrote:
> On Thu, Jul 9, 2020 at 7:48 PM Mark Rutland <mark.rutland at arm.com> wrote:
> [...]
> >
> > IIUC that's a 0.3% improvement. It'd be worth putting these results in
> > the commit message.
> Sure, I will.
> >
> > Could you also try that with "perf bench sched messaging" as the
> > workload? As a microbenchmark, that might show the highest potential
> > benefit, and it'd be nice to have those figures too if possible.
> I have finished 10 times of this test, and will put the results in the
> commit log too. In summary, this microbenchmark has about 1.69%
> improvement after this patch.
Great; thanks for gathering this data!
Mark.
>
> Test data:
>
> 1. without this patch, total 0.707 sec for 10 times
>
> # perf stat -r 10 perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.074 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.071 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.068 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.068 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
>
> Performance counter stats for 'perf bench sched messaging' (10 runs):
>
> 3,102.15 msec task-clock # 11.018 CPUs
> utilized ( +- 0.47% )
> 16,468 context-switches # 0.005 M/sec
> ( +- 2.56% )
> 6,877 cpu-migrations # 0.002 M/sec
> ( +- 3.44% )
> 83,645 page-faults # 0.027 M/sec
> ( +- 0.05% )
> 6,440,897,966 cycles # 2.076 GHz
> ( +- 0.37% )
> 3,620,264,483 instructions # 0.56 insn per
> cycle ( +- 0.11% )
> <not supported> branches
> 11,187,394 branch-misses
> ( +- 0.73% )
>
> 0.28155 +- 0.00166 seconds time elapsed ( +- 0.59% )
>
> 2. with this patch, totol 0.695 sec for 10 times
> perf stat -r 10 perf bench sched messaging
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.070 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.071 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.072 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.066 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.069 [sec]
> # Running 'sched/messaging' benchmark:
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
>
> Total time: 0.069 [sec]
>
> Performance counter stats for 'perf bench sched messaging' (10 runs):
>
> 3,098.48 msec task-clock # 11.182 CPUs
> utilized ( +- 0.38% )
> 15,485 context-switches # 0.005 M/sec
> ( +- 2.28% )
> 6,707 cpu-migrations # 0.002 M/sec
> ( +- 2.80% )
> 83,606 page-faults # 0.027 M/sec
> ( +- 0.00% )
> 6,435,068,186 cycles # 2.077 GHz
> ( +- 0.26% )
> 3,611,197,297 instructions # 0.56 insn per
> cycle ( +- 0.08% )
> <not supported> branches
> 11,323,244 branch-misses
> ( +- 0.51% )
>
> 0.277087 +- 0.000625 seconds time elapsed ( +- 0.23% )
>
>
> Thanks,
> Pingfan
More information about the linux-arm-kernel
mailing list