[PATCH v3 00/10] sysctl: Remove sentinel elements from kernel dir

Joel Granados via B4 Relay devnull+j.granados.samsung.com at kernel.org
Thu Mar 28 08:44:01 PDT 2024


From: Joel Granados <j.granados at samsung.com>

What?
These commits remove the sentinel element (last empty element) from the
sysctl arrays of all the files under the "kernel/" directory that use a
sysctl array for registration. The merging of the preparation patches
[1] to mainline allows us to remove sentinel elements without changing
behavior. This is safe because the sysctl registration code
(register_sysctl() and friends) use the array size in addition to
checking for a sentinel [2].

Why?
By removing the sysctl sentinel elements we avoid kernel bloat as
ctl_table arrays get moved out of kernel/sysctl.c into their own
respective subsystems. This move was started long ago to avoid merge
conflicts; the sentinel removal bit came after Mathew Wilcox suggested
it to avoid bloating the kernel by one element as arrays moved out. This
patchset will reduce the overall build time size of the kernel and run
time memory bloat by about ~64 bytes per declared ctl_table array (more
info here [5]).

When are we done?
There are 4 patchests (25 commits [3]) that are still outstanding to
completely remove the sentinels: files under "net/", files under
"kernel/" (this patchset) dir, misc dirs (files under mm/ security/ and
others) and the final set that removes the unneeded check for ->procname
== NULL.

Testing:
* Ran sysctl selftests (./tools/testing/selftests/sysctl/sysctl.sh)
* Ran this through 0-day with no errors or warnings

Savings in vmlinux:
  A total of 64 bytes per sentinel is saved after removal; I measured in
  x86_64 to give an idea of the aggregated savings. The actual savings
  will depend on individual kernel configuration.
    * bloat-o-meter
        - The "yesall" config saves 1984 bytes [6]
        - A reduced config [4] saves 1027 bytes [7]

Savings in allocated memory:
  None in this set but will occur when the superfluous allocations are
  removed from proc_sysctl.c. I include it here for context. The
  estimated savings during boot for config [3] are 6272 bytes. See [8]
  for how to measure it.

Comments/feedback greatly appreciated

Changes in v3:
- Rebased to v6.9-rc1
- wrote a shorter cover letter
- Removed willy at infradead.org from cc
- Link to v2: https://lore.kernel.org/r/20240104-jag-sysctl_remove_empty_elem_kernel-v2-0-836cc04e00ec@samsung.com

Changes in v2:
- No functional changes; I resent it as I did not see it in the latest
  sysctl-next. It might be a bit too late to include it in 6.7 version,
  but this v2 can be used for 6.8 when it comes out.
- Rebased on top of v6.7-rc6
- Added trailers to the relevant commits.
- Link to v1: https://lore.kernel.org/r/20231107-jag-sysctl_remove_empty_elem_kernel-v1-0-e4ce1388dfa0@samsung.com
Best

Joel

[1] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
[2] https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/joel.granados/linux.git/tag/?h=sysctl_remove_empty_elem_v5
[4] https://gist.github.com/Joelgranados/feaca7af5537156ca9b73aeaec093171

[5]
Links Related to the ctl_table sentinel removal:
* Good summaries from Luis:
  https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/
  https://lore.kernel.org/all/ZMFizKFkVxUFtSqa@bombadil.infradead.org/
* Patches adjusting sysctl register calls:
  https://lore.kernel.org/all/20230302204612.782387-1-mcgrof@kernel.org/
  https://lore.kernel.org/all/20230302202826.776286-1-mcgrof@kernel.org/
* Discussions about expectations and approach
  https://lore.kernel.org/all/20230321130908.6972-1-frank.li@vivo.com
  https://lore.kernel.org/all/20220220060626.15885-1-tangmeng@uniontech.com

[6]
add/remove: 0/0 grow/shrink: 0/31 up/down: 0/-1984 (-1984)
Function                                     old     new   delta
watchdog_sysctls                             576     512     -64
watchdog_hardlockup_sysctl                   128      64     -64
vm_table                                    1344    1280     -64
uts_kern_table                               448     384     -64
usermodehelper_table                         192     128     -64
user_table                                   832     768     -64
user_event_sysctls                           128      64     -64
timer_sysctl                                 128      64     -64
signal_debug_table                           128      64     -64
seccomp_sysctl_table                         192     128     -64
sched_rt_sysctls                             256     192     -64
sched_fair_sysctls                           256     192     -64
sched_energy_aware_sysctls                   128      64     -64
sched_dl_sysctls                             192     128     -64
sched_core_sysctls                           384     320     -64
sched_autogroup_sysctls                      128      64     -64
printk_sysctls                               512     448     -64
pid_ns_ctl_table_vm                          128      64     -64
pid_ns_ctl_table                             128      64     -64
latencytop_sysctl                            128      64     -64
kprobe_sysctls                               128      64     -64
kexec_core_sysctls                           256     192     -64
kern_table                                  2560    2496     -64
kern_reboot_table                            192     128     -64
kern_panic_table                             192     128     -64
kern_exit_table                              128      64     -64
kern_delayacct_table                         128      64     -64
kern_acct_table                              128      64     -64
hung_task_sysctls                            448     384     -64
ftrace_sysctls                               128      64     -64
bpf_syscall_table                            192     128     -64
Total: Before=429912331, After=429910347, chg -0.00%

[7]
add/remove: 0/1 grow/shrink: 0/16 up/down: 0/-1027 (-1027)
Function                                     old     new   delta
sched_core_sysctl_init                        39      36      -3
vm_table                                    1024     960     -64
uts_kern_table                               448     384     -64
usermodehelper_table                         192     128     -64
user_table                                   704     640     -64
signal_debug_table                           128      64     -64
seccomp_sysctl_table                         192     128     -64
sched_rt_sysctls                             256     192     -64
sched_fair_sysctls                           128      64     -64
sched_dl_sysctls                             192     128     -64
sched_core_sysctls                            64       -     -64
printk_sysctls                               512     448     -64
pid_ns_ctl_table_vm                          128      64     -64
kern_table                                  1920    1856     -64
kern_reboot_table                            192     128     -64
kern_panic_table                             128      64     -64
kern_exit_table                              128      64     -64
Total: Before=8522228, After=8521201, chg -0.01%

[8]
To measure the in memory savings apply this on top of this patchset.

"
"
diff --git i/fs/proc/proc_sysctl.c w/fs/proc/proc_sysctl.c
index 37cde0efee57..896c498600e8 100644
--- i/fs/proc/proc_sysctl.c
+++ w/fs/proc/proc_sysctl.c
@@ -966,6 +966,7 @@ static struct ctl_dir *new_dir(struct ctl_table_set *set,
        table[0].procname = new_name;
        table[0].mode = S_IFDIR|S_IRUGO|S_IXUGO;
        init_header(&new->header, set->dir.header.root, set, node, table, 1);
+       printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));

        return new;
 }
@@ -1189,6 +1190,7 @@ static struct ctl_table_header *new_links(struct ctl_dir *dir, s>
                link_name += len;
                link++;
        }
+       printk("%ld sysctl saved mem kzalloc\n", sizeof(struct ctl_table));
        init_header(links, dir->header.root, dir->header.set, node, link_table,
                    head->ctl_table_size);
        links->nreg = nr_entries;
"
and then run the following bash script in the kernel:

accum=0
for n in $(dmesg | grep kzalloc | awk '{print $3}') ; do
    accum=$(calc "$accum + $n")
done
echo $accum

---
Signed-off-by: Joel Granados <j.granados at samsung.com>

---
Joel Granados (10):
      kernel misc:  Remove the now superfluous sentinel elements from ctl_table array
      umh:  Remove the now superfluous sentinel elements from ctl_table array
      ftrace: Remove the now superfluous sentinel elements from ctl_table array
      timekeeping:  Remove the now superfluous sentinel elements from ctl_table array
      seccomp: Remove the now superfluous sentinel elements from ctl_table array
      scheduler: Remove the now superfluous sentinel elements from ctl_table array
      printk: Remove the now superfluous sentinel elements from ctl_table array
      kprobes: Remove the now superfluous sentinel elements from ctl_table array
      delayacct:  Remove the now superfluous sentinel elements from ctl_table array
      bpf: Remove the now superfluous sentinel elements from ctl_table array

 kernel/acct.c                    | 1 -
 kernel/bpf/syscall.c             | 1 -
 kernel/delayacct.c               | 1 -
 kernel/exit.c                    | 1 -
 kernel/hung_task.c               | 1 -
 kernel/kexec_core.c              | 1 -
 kernel/kprobes.c                 | 1 -
 kernel/latencytop.c              | 1 -
 kernel/panic.c                   | 1 -
 kernel/pid_namespace.c           | 1 -
 kernel/pid_sysctl.h              | 1 -
 kernel/printk/sysctl.c           | 1 -
 kernel/reboot.c                  | 1 -
 kernel/sched/autogroup.c         | 1 -
 kernel/sched/core.c              | 1 -
 kernel/sched/deadline.c          | 1 -
 kernel/sched/fair.c              | 1 -
 kernel/sched/rt.c                | 1 -
 kernel/sched/topology.c          | 1 -
 kernel/seccomp.c                 | 1 -
 kernel/signal.c                  | 1 -
 kernel/stackleak.c               | 1 -
 kernel/sysctl.c                  | 2 --
 kernel/time/timer.c              | 1 -
 kernel/trace/ftrace.c            | 1 -
 kernel/trace/trace_events_user.c | 1 -
 kernel/ucount.c                  | 3 +--
 kernel/umh.c                     | 1 -
 kernel/utsname_sysctl.c          | 1 -
 kernel/watchdog.c                | 2 --
 30 files changed, 1 insertion(+), 33 deletions(-)
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20231107-jag-sysctl_remove_empty_elem_kernel-7de90cfd0c0a

Best regards,
-- 
Joel Granados <j.granados at samsung.com>





More information about the kexec mailing list