irqbalance problem on Oracle X5-2

Neil Horman nhorman at tuxdriver.com
Fri Nov 13 13:39:08 PST 2015


On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote:
> Thanks for your reply, Neil.
> 
> Yes, when I manually set the irq affinity to avoid #18, it works.
> 
> I just downloaded and applied the latest irqbalance code, but it's
> showing the same behavior.
> 
What hint policy are you using?

Neil

> Regards,
> Mohsin
> 
> 
> On Fri, Nov 13, 2015 at 8:46 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> > On Thu, Nov 12, 2015 at 03:59:46PM -0500, Mohsin Zaidi wrote:
> >> Hello,
> >>
> >> We’ve run into an irqbalance CPU banning issue that seems to be
> >> present in version 1.0.4 as well as in newer versions 1.0.7 and 1.0.9.
> >>
> >> On an Oracle X5-2 with 72 cores, irqbalance keeps concentrating IRQs
> >> from one interface (eth03) (the active slave in a bonded pair running
> >> network traffic) on CPU 18/37 (more on #18), even though all CPUs but
> >> 1/37 have been banned from IRQ processing. We’re seeing this on
> >> multiple X5-2s. The interrupts are never directed to CPU 1. This does
> >> not seem to be a problem with other 32 core servers we have.
> >>
> >> I’ve attached the top CPU list, /proc/interrupts for eth03, irqbalance
> >> debug output, smp_affinity for eth03 IRQs (548-611), and the hardware
> >> topology.
> >>
> >> Any help would be appreciated. Please let me know if I can provide any
> >> additional information.
> >>
> >> Regards,
> >> Mohsin
> >
> > A few initial questions
> >
> > Are you able to set irq affinity manually on these systems?  And are you able to
> > see those affinities take effect?  I ask because the smp_affinity output you
> > sent me makes it look like writes to that file for a given interrupt aren't
> > getting picked up, and so the hardware is actually deciding where to steer
> > interrupts.
> >
> > Have you tried using an upstream version of irqbalance?  I ask because commit
> > f1bf15ed7ea63a04c76da033b78f8ffc806d4517, which came out after 1.0.9 fixes a
> > problem in which the --banirq option stopped working on a irq db reparsing.
> >
> > Neil
> >
> 



More information about the irqbalance mailing list