[PATCH] arm64/mm: save memory access in check_and_switch_context() fast switch path

Pingfan Liu kernelfans at gmail.com
Fri Jul 10 04:03:39 EDT 2020


On Thu, Jul 9, 2020 at 7:48 PM Mark Rutland <mark.rutland at arm.com> wrote:
[...]
>
> IIUC that's a 0.3% improvement. It'd be worth putting these results in
> the commit message.
Sure, I will.
>
> Could you also try that with "perf bench sched messaging" as the
> workload? As a microbenchmark, that might show the highest potential
> benefit, and it'd be nice to have those figures too if possible.
I have finished 10 times of this test, and will put the results in the
commit log too. In summary, this microbenchmark has about 1.69%
improvement after this patch.

Test data:

1. without this patch, total 0.707 sec for 10 times

# perf stat -r 10 perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.074 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.071 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.068 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.068 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]

 Performance counter stats for 'perf bench sched messaging' (10 runs):

          3,102.15 msec task-clock                #   11.018 CPUs
utilized            ( +-  0.47% )
            16,468      context-switches          #    0.005 M/sec
               ( +-  2.56% )
             6,877      cpu-migrations            #    0.002 M/sec
               ( +-  3.44% )
            83,645      page-faults               #    0.027 M/sec
               ( +-  0.05% )
     6,440,897,966      cycles                    #    2.076 GHz
               ( +-  0.37% )
     3,620,264,483      instructions              #    0.56  insn per
cycle           ( +-  0.11% )
   <not supported>      branches
        11,187,394      branch-misses
               ( +-  0.73% )

           0.28155 +- 0.00166 seconds time elapsed  ( +-  0.59% )

2. with this patch, totol 0.695 sec for 10 times
perf stat -r 10 perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.070 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.071 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.072 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.066 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.069 [sec]
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run

     Total time: 0.069 [sec]

 Performance counter stats for 'perf bench sched messaging' (10 runs):

          3,098.48 msec task-clock                #   11.182 CPUs
utilized            ( +-  0.38% )
            15,485      context-switches          #    0.005 M/sec
               ( +-  2.28% )
             6,707      cpu-migrations            #    0.002 M/sec
               ( +-  2.80% )
            83,606      page-faults               #    0.027 M/sec
               ( +-  0.00% )
     6,435,068,186      cycles                    #    2.077 GHz
               ( +-  0.26% )
     3,611,197,297      instructions              #    0.56  insn per
cycle           ( +-  0.08% )
   <not supported>      branches
        11,323,244      branch-misses
               ( +-  0.51% )

          0.277087 +- 0.000625 seconds time elapsed  ( +-  0.23% )


Thanks,
Pingfan



More information about the linux-arm-kernel mailing list