[RFC] arm64: use different compiler inlining options for arm64 kernel builds
Nafees Ahmed Abdul
nafeabd at amazon.com
Fri Jan 26 08:13:30 PST 2024
It was observed that compiling the kernel with following compiler inlining
options has showed performance improvement up to 4% across tested
workloads.
Below is the table with performance gains compared to default kernel
on few of the workloads we tested.
| Workload | Improvement |
|-----------|-------------|
| Nginx | 4% |
| MongoDB | 3% |
| Memcached | 3% |
| Redis | 2% |
The selected options increase inlining of the kernel functions which is
positively impacting workload performance. We have arrived at these values
by reviewing the options used with the gcc compiler at various optimization
levels such as O1, O2 and O3.
Benchmarking was performed on AWS Graviton 2/3 generation instances
on 5.10 and 6.1 longterm supported kernels with GCC versions 10 and 11.
Patch also adds noinline attribute to couple of functions for existing
kselftests to pass on ftrace target.
Signed-off-by: Nafees Ahmed Abdul <nafeabd at amazon.com>
---
arch/arm64/Makefile | 6 ++++++
fs/namei.c | 2 +-
fs/open.c | 2 +-
3 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 47ecc4cff9d2..4ad2c7d0008d 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -101,6 +101,12 @@ KBUILD_CFLAGS += -Wa,-march=$(asm-arch) \
-DARM64_ASM_ARCH='"$(asm-arch)"'
endif
+ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
+KBUILD_CFLAGS += --param=early-inlining-insns=14 --param=inline-heuristics-hint-percent=600 \
+ --param=inline-min-speedup=15 --param=max-inline-insns-auto=30 \
+ --param=max-inline-insns-single=200
+endif
+
ifeq ($(CONFIG_SHADOW_CALL_STACK), y)
KBUILD_CFLAGS += -ffixed-x18
endif
diff --git a/fs/namei.c b/fs/namei.c
index 71c13b2990b4..58e8f462c137 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -4445,7 +4445,7 @@ SYSCALL_DEFINE1(unlink, const char __user *, pathname)
* On non-idmapped mounts or if permission checking is to be performed on the
* raw inode simply passs @nop_mnt_idmap.
*/
-int vfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
+noinline vfs_symlink(struct mnt_idmap *idmap, struct inode *dir,
struct dentry *dentry, const char *oldname)
{
int error;
diff --git a/fs/open.c b/fs/open.c
index 02dc608d40d8..afec5d0e30d3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1421,7 +1421,7 @@ struct file *file_open_root(const struct path *root,
}
EXPORT_SYMBOL(file_open_root);
-static long do_sys_openat2(int dfd, const char __user *filename,
+static noinline long do_sys_openat2(int dfd, const char __user *filename,
struct open_how *how)
{
struct open_flags op;
--
2.40.1
More information about the linux-arm-kernel
mailing list