[patch 00/13] preempt: Make preempt count unconditional
Thomas Gleixner
tglx at linutronix.de
Mon Sep 14 16:42:09 EDT 2020
Folks!
While working on various preempt count related things, I stumbled (again)
over the inconsistency of our preempt count handling.
The handling of preempt_count() is inconsistent accross kernel
configurations. On kernels which have PREEMPT_COUNT=n
preempt_disable/enable() and the lock/unlock functions are not affecting
the preempt count, only local_bh_disable/enable() and _bh variants of
locking, soft interrupt delivery, hard interrupt and NMI context affect it.
It's therefore impossible to have a consistent set of checks which provide
information about the context in which a function is called. In many cases
it makes sense to have seperate functions for seperate contexts, but there
are valid reasons to avoid that and handle different calling contexts
conditionally.
The lack of such indicators which work on all kernel configuratios is a
constant source of trouble because developers either do not understand the
implications or try to work around this inconsistency in weird
ways. Neither seem these issues be catched by reviewers and testing.
Recently merged code does:
gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
Looks obviously correct, except for the fact that preemptible() is
unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
that code use GFP_ATOMIC on such kernels.
Attempts to make preempt count unconditional and consistent have been
rejected in the past with handwaving performance arguments.
Freshly conducted benchmarks did not reveal any measurable impact from
enabling preempt count unconditionally. On kernels with CONFIG_PREEMPT_NONE
or CONFIG_PREEMPT_VOLUNTARY the preempt count is only incremented and
decremented but the result of the decrement is not tested. Contrary to that
enabling CONFIG_PREEMPT which tests the result has a small but measurable
impact due to the conditional branch/call.
It's about time to make essential functionality of the kernel consistent
accross the various preemption models.
The series is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git preempt
That's the first part of a larger effort related to preempt count:
1) The analysis of the usage sites of in_interrupt(), in_atomic(),
in_softirq() is still ongoing, but so far the number of buggy users is
clearly the vast majority. There will be seperate patch series
(currently 46 and counting) to address these issues once the analysis
is complete in the next days.
2) The long discussed state tracking of local irq disable in preempt count
which accounts interrupt disabled sections as atomic and avoids issuing
costly instructions (sti, cli, popf or their non X86 counterparts) when
the state does not change, i.e. nested irq_save() or irq_restore(). I
have this working on X86 already and contrary to my earlier attempts
this was reasonably straight forward due to the recent entry/exit code
consolidation.
What I've not done yet is to optimize the preempt count handling
of the [un]lock_irq* operations so they handle the interrupt disabled
state and the preempt count modification in one go. That's an obvious
add on, but correctness first ...
3) Lazy interrupt disabling as a straight forward extension to #2. This
avoids the actual disabling at the CPU level completely and catches an
incoming interrupt in the low level entry code, modifies the interrupt
disabled state on the return stack, notes the interrupt as pending in
software and raises it again when interrupts are reenabled. This has
still a few issues which I'm hunting down (cpuidle is unhappy ...)
Thanks,
tglx
---
arch/arm/include/asm/assembler.h | 11 --
arch/arm/kernel/iwmmxt.S | 2
arch/arm/mach-ep93xx/crunch-bits.S | 2
arch/xtensa/kernel/entry.S | 2
drivers/gpu/drm/i915/Kconfig.debug | 1
drivers/gpu/drm/i915/i915_utils.h | 3
include/linux/bit_spinlock.h | 4 -
include/linux/lockdep.h | 6 -
include/linux/pagemap.h | 4 -
include/linux/preempt.h | 37 +---------
include/linux/uaccess.h | 6 -
kernel/Kconfig.preempt | 4 -
kernel/sched/core.c | 6 -
lib/Kconfig.debug | 3
lib/Kconfig.debug.rej | 14 +--
tools/testing/selftests/rcutorture/configs/rcu/SRCU-t | 1
tools/testing/selftests/rcutorture/configs/rcu/SRCU-u | 1
tools/testing/selftests/rcutorture/configs/rcu/TINY01 | 1
tools/testing/selftests/rcutorture/doc/TINY_RCU.txt | 5 -
tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1
tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h | 1
21 files changed, 23 insertions(+), 92 deletions(-)
More information about the linux-um
mailing list