[PATCH v3 00/10] sysctl: Remove sentinel elements from kernel dir
Joel Granados via B4 Relay
devnull+j.granados.samsung.com at kernel.org
Thu Mar 28 08:44:01 PDT 2024
From: Joel Granados <j.granados at samsung.com>
What?
These commits remove the sentinel element (last empty element) from the
sysctl arrays of all the files under the "kernel/" directory that use a
sysctl array for registration. The merging of the preparation patches
[1] to mainline allows us to remove sentinel elements without changing
behavior. This is safe because the sysctl registration code
(register_sysctl() and friends) use the array size in addition to
checking for a sentinel [2].
Why?
By removing the sysctl sentinel elements we avoid kernel bloat as
ctl_table arrays get moved out of kernel/sysctl.c into their own
respective subsystems. This move was started long ago to avoid merge
conflicts; the sentinel removal bit came after Mathew Wilcox suggested
it to avoid bloating the kernel by one element as arrays moved out. This
patchset will reduce the overall build time size of the kernel and run
time memory bloat by about ~64 bytes per declared ctl_table array (more
info here [5]).
When are we done?
There are 4 patchests (25 commits [3]) that are still outstanding to
completely remove the sentinels: files under "net/", files under
"kernel/" (this patchset) dir, misc dirs (files under mm/ security/ and
others) and the final set that removes the unneeded check for ->procname
== NULL.
Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Ran this through 0-day with no errors or warnings
Savings in vmlinux:
A total of 64 bytes per sentinel is saved after removal; I measured in
x86_64 to give an idea of the aggregated savings. The actual savings
will depend on individual kernel configuration.
* bloat-o-meter
- The "yesall" config saves 1984 bytes [6]
- A reduced config [4] saves 1027 bytes [7]
Savings in allocated memory:
None in this set but will occur when the superfluous allocations are
removed from proc_sysctl.c. I include it here for context. The
estimated savings during boot for config [3] are 6272 bytes. See [8]
for how to measure it.
Comments/feedback greatly appreciated
Changes in v3:
- Rebased to v6.9-rc1
- wrote a shorter cover letter
- Removed willy at infradead.org from cc
- Link to v2: https://lore.kernel.org/r/20240104-jag-sysctl_remove_empty_elem_kernel-v2-0-836cc04e00ec@samsung.com
Changes in v2:
- No functional changes; I resent it as I did not see it in the latest
sysctl-next. It might be a bit too late to include it in 6.7 version,
but this v2 can be used for 6.8 when it comes out.
- Rebased on top of v6.7-rc6
- Added trailers to the relevant commits.
- Link to v1: https://lore.kernel.org/r/20231107-jag-sysctl_remove_empty_elem_kernel-v1-0-e4ce1388dfa0@samsung.com
Best
Joel
[1] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
[2] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/tag/?h=sysctl_remove_empty_elem_v5
[4] https://gist.github.com/Joelgranados/feaca7af5537156ca9b73aeaec093171
[5]
Links Related to the ctl_table sentinel removal:
* Good summaries from Luis:
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@bombadil.infradead.org/
* Patches adjusting sysctl register calls:
https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@kernel.org/
https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@kernel.org/
* Discussions about expectations and approach
https://lore.kernel.org/all/20230321130908.6972-1-frank.li@vivo.com
https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@uniontech.com
[6]
add/remove: 0/0 grow/shrink: 0/31 up/down: 0/-1984 (-1984)
Function old new delta
watchdog_sysctls 576 512 -64
watchdog_hardlockup_sysctl 128 64 -64
vm_table 1344 1280 -64
uts_kern_table 448 384 -64
usermodehelper_table 192 128 -64
user_table 832 768 -64
user_event_sysctls 128 64 -64
timer_sysctl 128 64 -64
signal_debug_table 128 64 -64
seccomp_sysctl_table 192 128 -64
sched_rt_sysctls 256 192 -64
sched_fair_sysctls 256 192 -64
sched_energy_aware_sysctls 128 64 -64
sched_dl_sysctls 192 128 -64
sched_core_sysctls 384 320 -64
sched_autogroup_sysctls 128 64 -64
printk_sysctls 512 448 -64
pid_ns_ctl_table_vm 128 64 -64
pid_ns_ctl_table 128 64 -64
latencytop_sysctl 128 64 -64
kprobe_sysctls 128 64 -64
kexec_core_sysctls 256 192 -64
kern_table 2560 2496 -64
kern_reboot_table 192 128 -64
kern_panic_table 192 128 -64
kern_exit_table 128 64 -64
kern_delayacct_table 128 64 -64
kern_acct_table 128 64 -64
hung_task_sysctls 448 384 -64
ftrace_sysctls 128 64 -64
bpf_syscall_table 192 128 -64
Total: Before=429912331, After=429910347, chg -0.00%
[7]
add/remove: 0/1 grow/shrink: 0/16 up/down: 0/-1027 (-1027)
Function old new delta
sched_core_sysctl_init 39 36 -3
vm_table 1024 960 -64
uts_kern_table 448 384 -64
usermodehelper_table 192 128 -64
user_table 704 640 -64
signal_debug_table 128 64 -64
seccomp_sysctl_table 192 128 -64
sched_rt_sysctls 256 192 -64
sched_fair_sysctls 128 64 -64
sched_dl_sysctls 192 128 -64
sched_core_sysctls 64 - -64
printk_sysctls 512 448 -64
pid_ns_ctl_table_vm 128 64 -64
kern_table 1920 1856 -64
kern_reboot_table 192 128 -64
kern_panic_table 128 64 -64
kern_exit_table 128 64 -64
Total: Before=8522228, After=8521201, chg -0.01%
[8]
To measure the in memory savings apply this on top of this patchset.
"
"
diff --git i/fs/proc/proc_sysctl.c w/fs/proc/proc_sysctl.c
index 37cde0efee57..896c498600e8 100644
--- i/fs/proc/proc_sysctl.c
+++ w/fs/proc/proc_sysctl.c
@@ -966,6 +966,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
table[0].procname = new_name;
table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
init_header(&new->header, set->dir.header.root, set, node, table, 1);
+ printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));
return new;
}
@@ -1189,6 +1190,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, s>
link_name += len;
link++;
}
+ printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));
init_header(links, dir->header.root, dir->header.set, node, link_table,
head->ctl_table_size);
links->nreg = nr_entries;
"
and then run the following bash script in the kernel:
accum=0
for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do
accum=$(calc "$accum + $n")
done
echo $accum
---
Signed-off-by: Joel Granados <j.granados at samsung.com>
---
Joel Granados (10):
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
umh: Remove the now superfluous sentinel elements from ctl_table array
ftrace: Remove the now superfluous sentinel elements from ctl_table array
timekeeping: Remove the now superfluous sentinel elements from ctl_table array
seccomp: Remove the now superfluous sentinel elements from ctl_table array
scheduler: Remove the now superfluous sentinel elements from ctl_table array
printk: Remove the now superfluous sentinel elements from ctl_table array
kprobes: Remove the now superfluous sentinel elements from ctl_table array
delayacct: Remove the now superfluous sentinel elements from ctl_table array
bpf: Remove the now superfluous sentinel elements from ctl_table array
kernel/acct.c | 1 -
kernel/bpf/syscall.c | 1 -
kernel/delayacct.c | 1 -
kernel/exit.c | 1 -
kernel/hung_task.c | 1 -
kernel/kexec_core.c | 1 -
kernel/kprobes.c | 1 -
kernel/latencytop.c | 1 -
kernel/panic.c | 1 -
kernel/pid_namespace.c | 1 -
kernel/pid_sysctl.h | 1 -
kernel/printk/sysctl.c | 1 -
kernel/reboot.c | 1 -
kernel/sched/autogroup.c | 1 -
kernel/sched/core.c | 1 -
kernel/sched/deadline.c | 1 -
kernel/sched/fair.c | 1 -
kernel/sched/rt.c | 1 -
kernel/sched/topology.c | 1 -
kernel/seccomp.c | 1 -
kernel/signal.c | 1 -
kernel/stackleak.c | 1 -
kernel/sysctl.c | 2 --
kernel/time/timer.c | 1 -
kernel/trace/ftrace.c | 1 -
kernel/trace/trace_events_user.c | 1 -
kernel/ucount.c | 3 +--
kernel/umh.c | 1 -
kernel/utsname_sysctl.c | 1 -
kernel/watchdog.c | 2 --
30 files changed, 1 insertion(+), 33 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231107-jag-sysctl_remove_empty_elem_kernel-7de90cfd0c0a
Best regards,
--
Joel Granados <j.granados at samsung.com>
More information about the kexec
mailing list