[PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path
Pingfan Liu
kernelfans at gmail.com
Fri Jul 10 04:03:39 EDT 2020
On Thu, Jul 9, 2020 at 7:48 PM Mark Rutland <mark.rutland at arm.com> wrote:
[...]
>
> IIUC that's a 0.3% improvement. It'd be worth putting these results in
> the commit message.
Sure, I will.
>
> Could you also try that with "perf bench sched messaging" as the
> workload? As a microbenchmark, that might show the highest potential
> benefit, and it'd be nice to have those figures too if possible.
I have finished 10 times of this test, and will put the results in the
commit log too. In summary, this microbenchmark has about 1.69%
improvement after this patch.
Test data:
1. without this patch, total 0.707 sec for 10 times
# perf stat -r 10 perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.074 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.071 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.068 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.068 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
Performance counter stats for 'perf bench sched messaging' (10 runs):
3,102.15 msec task-clock # 11.018 CPUs
utilized ( +- 0.47% )
16,468 context-switches # 0.005 M/sec
( +- 2.56% )
6,877 cpu-migrations # 0.002 M/sec
( +- 3.44% )
83,645 page-faults # 0.027 M/sec
( +- 0.05% )
6,440,897,966 cycles # 2.076 GHz
( +- 0.37% )
3,620,264,483 instructions # 0.56 insn per
cycle ( +- 0.11% )
<not supported> branches
11,187,394 branch-misses
( +- 0.73% )
0.28155 +- 0.00166 seconds time elapsed ( +- 0.59% )
2. with this patch, totol 0.695 sec for 10 times
perf stat -r 10 perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.071 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.066 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.069 [sec]
Performance counter stats for 'perf bench sched messaging' (10 runs):
3,098.48 msec task-clock # 11.182 CPUs
utilized ( +- 0.38% )
15,485 context-switches # 0.005 M/sec
( +- 2.28% )
6,707 cpu-migrations # 0.002 M/sec
( +- 2.80% )
83,606 page-faults # 0.027 M/sec
( +- 0.00% )
6,435,068,186 cycles # 2.077 GHz
( +- 0.26% )
3,611,197,297 instructions # 0.56 insn per
cycle ( +- 0.08% )
<not supported> branches
11,323,244 branch-misses
( +- 0.51% )
0.277087 +- 0.000625 seconds time elapsed ( +- 0.23% )
Thanks,
Pingfan
More information about the linux-arm-kernel
mailing list