[BUG] 2.6.37-rc3 massive interactivity regression on ARM

Mikael Pettersson mikpe at it.uu.se
Sun Dec 5 07:32:37 EST 2010


Mikael Pettersson writes:
 > The scenario is that I do a remote login to an ARM build server,
 > use screen to start a sub-shell, in that shell start a largish
 > compile job, detach from that screen, and from the original login
 > shell I occasionally monitor the compile job with top or ps or
 > by attaching to the screen.
 > 
 > With kernels 2.6.37-rc2 and -rc3 this causes the machine to become
 > very sluggish: top takes forever to start, once started it shows no
 > activity from the compile job (it's as if it's sleeping on a lock),
 > and ps also takes forever and shows no activity from the compile job.
 > 
 > Rebooting into 2.6.36 eliminates these issues.
 > 
 > I do pretty much the same thing (remote login -> screen -> compile job)
 > on other archs, but so far I've only seen the 2.6.37-rc misbehaviour
 > on ARM EABI, specifically on an IOP n2100. (I have access to other ARM
 > sub-archs, but haven't had time to test 2.6.37-rc on them yet.)
 > 
 > Has anyone else seen this? Any ideas about the cause?

(Re-followup since I just realised my previous followups were to Rafael's
regressions mailbot rather than the original thread.)

> The bug is still present in 2.6.37-rc4.  I'm currently trying to bisect it.

git bisect identified

[305e6835e05513406fa12820e40e4a8ecb63743c] sched: Do not account irq time to current task

as the cause of this regression.  Reverting it from 2.6.37-rc4 (requires some
hackery due to subsequent changes in the same area) restores sane behaviour.

The original patch submission talks about irq-heavy scenarios.  My case is the
exact opposite: UP, !PREEMPT, NO_HZ, very low irq rate, essentially 100% CPU
bound in userspace but expected to schedule quickly when needed (e.g. running
top or ps or just hitting CR in one shell while another runs a compile job).

I've reproduced the misbehaviour with 2.6.37-rc4 on ARM/mach-iop32x and
ARM/mach-ixp4xx, but ARM/mach-kirkwood does not misbehave, and other archs
(x86 SMP, SPARC64 UP and SMP, PowerPC32 UP, Alpha UP) also do not misbehave.

So it looks like an ARM-only issue, possibly depending on platform specifics.

One difference I noticed between my Kirkwood machine and my ixp4xx and iop32x
machines is that even though all have CONFIG_NO_HZ=y, the timer irq rate is
much higher on Kirkwood, even when the machine is idle.

/Mikael



More information about the linux-arm-kernel mailing list