irqbalance problem on Oracle X5-2

Neil Horman nhorman at tuxdriver.com
Fri Nov 20 13:23:58 PST 2015


On Fri, Nov 20, 2015 at 01:45:37PM -0500, Mohsin Zaidi wrote:
> Some more observations.
> 
> When I said yesterday that changing the unbanned CPUs to 19/55 or
> 18/54 worked correctly for all IRQs, I failed to notice that of the
> 256 IRQs for the interfaces, 3 would never have their affinities get
> updated correctly.
> 
> For example, with the banning mask set to "ff,ff7fffff,fff7ffff", the
> smp_affinity_list values for the last 10 IRQs are as follows:
> 
> 19
> 55
> 26
> 55
> 24
> 55
> 19
> 19
> 19
> 22
> 
> 3 of these are set to whatever was set for them last (my last test was
> to unban all CPUs). I see this pattern repeated every time.
> 
> I changed the test to unban 18-19,54-55 at the same time, and this
> problem went away. When I unbanned just 19/55 and reduced the number
> of queues per interface by one, the problem also went away.
> 
> It's as if 2 CPUs can't be successfully assigned 256 IRQs. This also
> holds true if the CPUs are not siblings (e.g. 19/54).
> 
I wonder if this is a hardware limitation (i.e. if you're hitting the upper
limit of the elligible cpu set in an MSI write or some such).

If you manually set all irqs to a single cpu, what happens?

Neil

> So there are two dimensions to the problem. One is choosing CPUs just
> on NUMA node 0 doesn't work, and the other is that assigning 256 IRQs
> to 2 CPUs on NUMA node 1 doesn't work.
> Regards,
> Mohsin
> 
> 
> On Fri, Nov 20, 2015 at 9:45 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> > On Thu, Nov 19, 2015 at 01:32:58PM -0500, Mohsin Zaidi wrote:
> >> Thanks, Neil. I'll have the results for you shortly.
> >>
> >> I wanted to point out that each of the 4 interfaces on the server have
> >> 64 queues, so there are a total of 256 queues. Also, the banning is
> >> attempting to direct interrupts to just two processors (#1 and #37) on
> >> the same NUMA node, which is also not the same as the NUMA node that
> >> "owns" the interface I am looking at (eth03).
> >>
> >> Does any of this matter?
> > It really shouldn't, but given that I'm at a loss to explain the behavior yet,
> > anything is on the table.
> > Neil
> >
> >> Regards,
> >> Mohsin
> >>
> >>
> >> On Thu, Nov 19, 2015 at 9:58 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> >> > On Wed, Nov 18, 2015 at 10:42:41AM -0500, Mohsin Zaidi wrote:
> >> >> I'm using the irqbalance daemon with the following config file. The
> >> >> only thing I've changed is the banned CPUs list, and I've banned all
> >> >> but CPUs #1 and #37. Interrupts *never* go to #1, and go to #18 and
> >> >> #37, even though #18 has also been banned.
> >> >>
> >> >> # irqbalance is a daemon process that distributes interrupts across
> >> >> # CPUS on SMP systems. The default is to rebalance once every 10
> >> >> # seconds. This is the environment file that is specified to systemd via the
> >> >> # EnvironmentFile key in the service unit file (or via whatever method the init
> >> >> # system you're using has.
> >> >> #
> >> >> # ONESHOT=yes
> >> >> # after starting, wait for a minute, then look at the interrupt
> >> >> # load and balance it once; after balancing exit and do not change
> >> >> # it again.
> >> >> #IRQBALANCE_ONESHOT=
> >> >>
> >> >> #
> >> >> # IRQBALANCE_BANNED_CPUS
> >> >> # 64 bit bitmask which allows you to indicate which cpu's should
> >> >> # be skipped when reblancing irqs. Cpu numbers which have their
> >> >> # corresponding bits set to one in this mask will not have any
> >> >> # irq's assigned to them on rebalance
> >> >> #
> >> >> #IRQBALANCE_BANNED_CPUS=
> >> >> IRQBALANCE_BANNED_CPUS=000000ff,ffffffdf,fffffffd
> >> >>
> >> >> #
> >> >> # IRQBALANCE_ARGS
> >> >> # append any args here to the irqbalance daemon as documented in the man page
> >> >> #
> >> >> #IRQBALANCE_ARGS=
> >> >> Regards,
> >> >> Mohsin
> >> >>
> >> >>
> >> >> On Wed, Nov 18, 2015 at 10:28 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> >> >> > On Wed, Nov 18, 2015 at 10:04:56AM -0500, Mohsin Zaidi wrote:
> >> >> >> Sorry about that, Neil.
> >> >> >>
> >> >> >> I haven't specified any hint policy in IRQBALANCE_ARGS (for the daemon).
> >> >> >> Regards,
> >> >> >> Mohsin
> >> >> >>
> >> >> > Ok, well, I'm at a bit of a loss.  irqbalance, based on your output from the
> >> >> > debug log, is working properly, presuming you actually listed cpus 18 and 37 as
> >> >> > your only unbanned one, which you indicate is the opposite of what you've
> >> >> > configured.
> >> >> >
> >> >> > Can you please send me the command line you use to start irqbalance?
> >> >> >
> >> >> > Neil
> >> >> >
> >> >> >>
> >> >> >> On Wed, Nov 18, 2015 at 6:36 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> >> >> >> > On Fri, Nov 13, 2015 at 04:39:08PM -0500, Neil Horman wrote:
> >> >> >> >> On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote:
> >> >> >> >> > Thanks for your reply, Neil.
> >> >> >> >> >
> >> >> >> >> > Yes, when I manually set the irq affinity to avoid #18, it works.
> >> >> >> >> >
> >> >> >> >> > I just downloaded and applied the latest irqbalance code, but it's
> >> >> >> >> > showing the same behavior.
> >> >> >> >> >
> >> >> >> >> What hint policy are you using?
> >> >> >> >>
> >> >> >> >> Neil
> >> >> >> >>
> >> >> >> > Ping, any response regarding hint policy?
> >> >> >> >
> >> >> >> > Neil
> >> >> >> >
> >> >> >>
> >> >>
> >> >
> >> > I'm at something of a loss here.  I can see no reason why this would fail on
> >> > only one system.  In an effort to get additional data, please apply this patch,
> >> > run irqbalance in debug mode and post the output please.
> >> >
> >> > Thanks!
> >> > Neil
> >> >
> >> >
> >> > diff --git a/activate.c b/activate.c
> >> > index c8453d5..d92e770 100644
> >> > --- a/activate.c
> >> > +++ b/activate.c
> >> > @@ -113,6 +113,7 @@ static void activate_mapping(struct irq_info *info, void *data __attribute__((un
> >> >                 return;
> >> >
> >> >         cpumask_scnprintf(buf, PATH_MAX, applied_mask);
> >> > +       printf("Applying mask for irq %d: 5s\n", info->irq, buf);
> >> >         fprintf(file, "%s", buf);
> >> >         fclose(file);
> >> >         info->moved = 0; /*migration is done*/
> >>
> 



More information about the irqbalance mailing list