irqbalance problem on Oracle X5-2

Mohsin Zaidi mohsinrzaidi at gmail.com
Thu Nov 19 10:32:58 PST 2015


Thanks, Neil. I'll have the results for you shortly.

I wanted to point out that each of the 4 interfaces on the server have
64 queues, so there are a total of 256 queues. Also, the banning is
attempting to direct interrupts to just two processors (#1 and #37) on
the same NUMA node, which is also not the same as the NUMA node that
"owns" the interface I am looking at (eth03).

Does any of this matter?
Regards,
Mohsin


On Thu, Nov 19, 2015 at 9:58 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
> On Wed, Nov 18, 2015 at 10:42:41AM -0500, Mohsin Zaidi wrote:
>> I'm using the irqbalance daemon with the following config file. The
>> only thing I've changed is the banned CPUs list, and I've banned all
>> but CPUs #1 and #37. Interrupts *never* go to #1, and go to #18 and
>> #37, even though #18 has also been banned.
>>
>> # irqbalance is a daemon process that distributes interrupts across
>> # CPUS on SMP systems. The default is to rebalance once every 10
>> # seconds. This is the environment file that is specified to systemd via the
>> # EnvironmentFile key in the service unit file (or via whatever method the init
>> # system you're using has.
>> #
>> # ONESHOT=yes
>> # after starting, wait for a minute, then look at the interrupt
>> # load and balance it once; after balancing exit and do not change
>> # it again.
>> #IRQBALANCE_ONESHOT=
>>
>> #
>> # IRQBALANCE_BANNED_CPUS
>> # 64 bit bitmask which allows you to indicate which cpu's should
>> # be skipped when reblancing irqs. Cpu numbers which have their
>> # corresponding bits set to one in this mask will not have any
>> # irq's assigned to them on rebalance
>> #
>> #IRQBALANCE_BANNED_CPUS=
>> IRQBALANCE_BANNED_CPUS=000000ff,ffffffdf,fffffffd
>>
>> #
>> # IRQBALANCE_ARGS
>> # append any args here to the irqbalance daemon as documented in the man page
>> #
>> #IRQBALANCE_ARGS=
>> Regards,
>> Mohsin
>>
>>
>> On Wed, Nov 18, 2015 at 10:28 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
>> > On Wed, Nov 18, 2015 at 10:04:56AM -0500, Mohsin Zaidi wrote:
>> >> Sorry about that, Neil.
>> >>
>> >> I haven't specified any hint policy in IRQBALANCE_ARGS (for the daemon).
>> >> Regards,
>> >> Mohsin
>> >>
>> > Ok, well, I'm at a bit of a loss.  irqbalance, based on your output from the
>> > debug log, is working properly, presuming you actually listed cpus 18 and 37 as
>> > your only unbanned one, which you indicate is the opposite of what you've
>> > configured.
>> >
>> > Can you please send me the command line you use to start irqbalance?
>> >
>> > Neil
>> >
>> >>
>> >> On Wed, Nov 18, 2015 at 6:36 AM, Neil Horman <nhorman at tuxdriver.com> wrote:
>> >> > On Fri, Nov 13, 2015 at 04:39:08PM -0500, Neil Horman wrote:
>> >> >> On Fri, Nov 13, 2015 at 01:39:20PM -0500, Mohsin Zaidi wrote:
>> >> >> > Thanks for your reply, Neil.
>> >> >> >
>> >> >> > Yes, when I manually set the irq affinity to avoid #18, it works.
>> >> >> >
>> >> >> > I just downloaded and applied the latest irqbalance code, but it's
>> >> >> > showing the same behavior.
>> >> >> >
>> >> >> What hint policy are you using?
>> >> >>
>> >> >> Neil
>> >> >>
>> >> > Ping, any response regarding hint policy?
>> >> >
>> >> > Neil
>> >> >
>> >>
>>
>
> I'm at something of a loss here.  I can see no reason why this would fail on
> only one system.  In an effort to get additional data, please apply this patch,
> run irqbalance in debug mode and post the output please.
>
> Thanks!
> Neil
>
>
> diff --git a/activate.c b/activate.c
> index c8453d5..d92e770 100644
> --- a/activate.c
> +++ b/activate.c
> @@ -113,6 +113,7 @@ static void activate_mapping(struct irq_info *info, void *data __attribute__((un
>                 return;
>
>         cpumask_scnprintf(buf, PATH_MAX, applied_mask);
> +       printf("Applying mask for irq %d: 5s\n", info->irq, buf);
>         fprintf(file, "%s", buf);
>         fclose(file);
>         info->moved = 0; /*migration is done*/



More information about the irqbalance mailing list