RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

Wed Sep 6 05:28:44 PDT 2017

On Tue, Aug 22, 2017 at 08:26:37AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 22, 2017 at 02:21:32PM +0530, Abdul Haleem wrote:
> > On Tue, 2017-08-22 at 08:49 +0100, Jonathan Cameron wrote:

[ . . . ]

> > No more RCU stalls on PowerPC, system is clean when idle or with some
> > test runs.
> > 
> > Thank you all for your time and efforts in fixing this.
> > 
> > Reported-and-Tested-by: Abdul Haleem <abdhalee at linux.vnet.ibm.com>
> 
> I am still seeing failures, but then again I am running rcutorture with
> lots of CPU hotplug activity.  So I am probably seeing some other bug,
> though it still looks a lot like a lost timer.

So one problem appears to be a timing-related deadlock between RCU and
timers.  The way that this can happen is that the outgoing CPU goes
offline (as in cpuhp_report_idle_dead() invoked from do_idle()) with
one of RCU's grace-period kthread's timers queued.  Now, if someone
waits for a grace period, either directly or indirectly, in a way that
blocks the hotplug notifiers, execution will never reach timers_dead_cpu(),
which means that the grace-period kthread will never wake, which will
mean that the grace period will never complete.  Classic deadlock.

I currently have an extremely ugly workaround for this deadlock, which
is to periodically and (usually) redundantly wake up all the RCU
grace-period kthreads from the scheduling-interrupt handler.  This is
of course completely inappropriate for mainline, but it does reliably
prevent the "kthread starved for %ld jiffies!" type of RCU CPU stall
warning that I would otherwise see.

To mainline this, one approach would be to make the timers switch to
add_timer_on() to a surviving CPU once the offlining process starts.
Alternatively, I suppose that RCU could do the redundant-wakeup kludge,
but with checks to prevent it from happening unless (1) there is a CPU
in the process of going offline (2) there is an RCU grace period in
progress, and (3) the RCU grace period kthread has been blocked for
(say) three times longer than it should have.

Unfortunately, this is not sufficient to make rcutorture run reliably,
though it does help, which is of course to say that it makes debugging
slower.  ;-)

What happens now is that random rcutorture kthreads will hang waiting for
timeouts to complete.  This confused me for awhile because I expected
that the timeouts would be delayed during offline processing, but that
my crude deadlock-resolution approach would eventually get things going.
My current suspicion is that the problem is due to a potential delay
between the time an outgoing CPU hits cpuhp_report_idle_dead() and the
timers get migrated from timers_dead_cpu().  This means that the CPU
adopting the timers might be a few ticks ahead of where the outgoing CPU
last processed timers.  My current guess is that any timers queued in
intervening indexes are going to wait one good long time.  And I don't see
any code in the timers_dead_cpu() that would account for this possibility,
though I of course cannot claim to fully understand this code..

Is this plausible, or am I confused?  (Either way, -something- besides
just me is rather thoroughly confused!)

If this is plausible, my guess is that timers_dead_cpu() needs to check
for mismatched indexes (in timer->flags?) and force any intervening
timers to expire if so.

Thoughts?

							Thanx, Paul