[PATCH] arm64: entry: Improve the performance of system calls

Zhen Lei thunder.leizhen at huawei.com
Fri Sep 3 05:19:50 PDT 2021


Commit 582f95835a8f ("arm64: entry: convert el0_sync to C") converted lots
of functions from assembly to C, this greatly improves readability. But
el0_svc()/el0_svc_compat() is in response to system call requests from
user mode and may be in the hot path.

Although the SVC is in the first case of the switch statement in C, the
compiler optimizes the switch statement as a whole, and does not give SVC
a small boost.

Use "likely()" to help SVC directly invoke its handler after a simple
judgment to avoid entering the switch table lookup process.

After:
0000000000000ff0 <el0t_64_sync_handler>:
     ff0:       d503245f        bti     c
     ff4:       d503233f        paciasp
     ff8:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
     ffc:       910003fd        mov     x29, sp
    1000:       d5385201        mrs     x1, esr_el1
    1004:       531a7c22        lsr     w2, w1, #26
    1008:       f100545f        cmp     x2, #0x15
    100c:       540000a1        b.ne    1020 <el0t_64_sync_handler+0x30>
    1010:       97fffe14        bl      860 <el0_svc>
    1014:       a8c17bfd        ldp     x29, x30, [sp], #16
    1018:       d50323bf        autiasp
    101c:       d65f03c0        ret
    1020:       f100705f        cmp     x2, #0x1c

Execute "./lat_syscall null" on my board (BogoMIPS : 200.00), it can save
about 10ns.

Before:
Simple syscall: 0.2365 microseconds
Simple syscall: 0.2354 microseconds
Simple syscall: 0.2339 microseconds

After:
Simple syscall: 0.2255 microseconds
Simple syscall: 0.2254 microseconds
Simple syscall: 0.2256 microseconds

Signed-off-by: Zhen Lei <thunder.leizhen at huawei.com>
---
 arch/arm64/kernel/entry-common.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
index 32f9796c4ffe77b..062eb5a895ec6f3 100644
--- a/arch/arm64/kernel/entry-common.c
+++ b/arch/arm64/kernel/entry-common.c
@@ -607,11 +607,14 @@ static void noinstr el0_fpac(struct pt_regs *regs, unsigned long esr)
 asmlinkage void noinstr el0t_64_sync_handler(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	unsigned long ec = ESR_ELx_EC(esr);
 
-	switch (ESR_ELx_EC(esr)) {
-	case ESR_ELx_EC_SVC64:
+	if (likely(ec == ESR_ELx_EC_SVC64)) {
 		el0_svc(regs);
-		break;
+		return;
+	}
+
+	switch (ec) {
 	case ESR_ELx_EC_DABT_LOW:
 		el0_da(regs, esr);
 		break;
@@ -730,11 +733,14 @@ static void noinstr el0_svc_compat(struct pt_regs *regs)
 asmlinkage void noinstr el0t_32_sync_handler(struct pt_regs *regs)
 {
 	unsigned long esr = read_sysreg(esr_el1);
+	unsigned long ec = ESR_ELx_EC(esr);
 
-	switch (ESR_ELx_EC(esr)) {
-	case ESR_ELx_EC_SVC32:
+	if (likely(ec == ESR_ELx_EC_SVC32)) {
 		el0_svc_compat(regs);
-		break;
+		return;
+	}
+
+	switch (ec) {
 	case ESR_ELx_EC_DABT_LOW:
 		el0_da(regs, esr);
 		break;
-- 
2.25.1




More information about the linux-arm-kernel mailing list