[patch 00/13] preempt: Make preempt count unconditional

Mon Sep 14 17:55:24 EDT 2020

On Mon, Sep 14 2020 at 13:59, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx at linutronix.de> wrote:
>>
>> Recently merged code does:
>>
>>          gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
>>
>> Looks obviously correct, except for the fact that preemptible() is
>> unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
>> that code use GFP_ATOMIC on such kernels.
>
> I don't think this is a good reason to entirely get rid of the
> no-preempt thing.

I did not say that this is a good reason. It just illustrates the
problem.

> The above is just garbage. It's bogus. You can't do it.
>
> Blaming the no-preempt code for this bug is extremely unfair, imho.

I'm not blaming the no-preempt code. I'm blaming inconsistency and there
is no real good argument for inconsistent behaviour, TBH.

> And the no-preempt code does help make for much better code generation
> for simple spinlocks.

Yes it does generate better code, but I tried hard to spot a difference
in various metrics exposed by perf. It's all in the noise and I only
can spot a difference when the actual preemption check after the
decrement which still depends on CONFIG_PREEMPT is in place, but that's
not the case for PREEMPT_NONE or PREEMPT_VOLUNTARY kernels where the
decrement is just a decrement w/o any conditional behind it.

> Where is that horribly buggy recent code? It's not in that exact
> format, certainly, since 'grep' doesn't find it.

Bah, that was stuff in next which got dropped again.

But just look at any check which uses preemptible(), especially those
which check !preemptible():

In the X86 #GP handler we have:

	/*
	 * To be potentially processing a kprobe fault and to trust the result
	 * from kprobe_running(), we have to be non-preemptible.
	 */
	if (!preemptible() &&
	    kprobe_running() &&
	    kprobe_fault_handler(regs, X86_TRAP_GP))
		goto exit;

and a similar check in the S390 code in kprobe_exceptions_notify(). That
all magically 'works' because that code might have been actually tested
with lockdep enabled which enforces PREEMPT_COUNT...

The SG code has some interesting usage as well:

		if (miter->__flags & SG_MITER_ATOMIC) {
			WARN_ON_ONCE(preemptible());
			kunmap_atomic(miter->addr);

How is that WARN_ON_ONCE() supposed to catch anything? Especially as
calling code does:

	flags = SG_MITER_TO_SG;
	if (!preemptible())
		flags |= SG_MITER_ATOMIC;

which is equally useless on kernels which have PREEMPT_COUNT=n.

There are bugs which are related to in_atomic() or other in_***() usage
all over the place as well.

Inconsistency at the core level is a clear recipe for disaster and at
some point we have to bite the bullet and accept that consistency is
more important than the non measurable 3 cycles?

Thanks,

        tglx