irqbalancer subset policy and CPU lock up on storage controller.
Kashyap Desai
kashyap.desai at avagotech.com
Mon Oct 12 12:15:01 PDT 2015
> On Mon, Oct 12, 2015 at 11:52:30PM +0530, Kashyap Desai wrote:
> > > > What should be the solution if we really want to slow down IO
> > > > submission to avoid CPU lockup. We don't want only one CPU to keep
> > > > busy for completion.
> > > >
> > > > Any suggestion ?
> > > >
> > > Yup, file a bug with Oracle :)
> >
> > Neil -
> >
> > Thanks for info. I understood to use latest <irqbalance>...that was
> > already attempted. I tried with latest irqbalance and I see expected
> > behavior as long as I provide <exact> or <subset> + <--poliicyscript>.
> > We are planning for the same, but wanted to understand what is latest
> > <irqbalancer> default settings. Is there any reason we are seeing
> > default settings changed from subset to ignore ?
> >
>
> Latest defaults are that hinting is ignored by default, but hinting can
also be
> set via a policyscript on an irq by irq basis.
>
> The reasons for changing the default behavior are documented in commit
> d9138c78c3e8cb286864509fc444ebb4484c3d70. Irq affinity hinting is
> effectively a holdover from back in the days when irqbalance couldn't
> understand a devices locality and irq count easily. Now that it can,
there is
> really no need for an irq affinity hint, unless your driver doesn't
properly
> participate in sysfs device ennumeration.
Neil - I went through those details, but could not understand how <ignore>
policy is useful. I may be missing something here. :-(
With <ignore> policy, mpt3sas driver on 32 logical CPU system has below
affinity mask. As you said, driver hint is ignored. That is understood as
<ignore> is hinting for the same, but why affinity mask is just localized
to local node (Node 0 in this case) ?
What is confusing me is - "cpu affinity mask" is just localize to Numa
Node-0 as PCI device enumeration detected pci device is local to
numa_node 0.
msix index = 0, irq number = 120, cpu affinity mask = 00400040
hint = 00000001 < - CPU mask on node-0 is 00FF00FF
msix index = 1, irq number = 121, cpu affinity mask = 00800080
hint = 00000002
msix index = 2, irq number = 122, cpu affinity mask = 00400040
hint = 00000004
msix index = 3, irq number = 123, cpu affinity mask = 00100010
hint = 00000008
msix index = 4, irq number = 124, cpu affinity mask = 00800080
hint = 00000010
msix index = 5, irq number = 125, cpu affinity mask = 00020002
hint = 00000020
msix index = 6, irq number = 126, cpu affinity mask = 00400040
hint = 00000040
msix index = 7, irq number = 127, cpu affinity mask = 00800080
hint = 00000080
msix index = 8, irq number = 128, cpu affinity mask = 00400040
hint = 00000100
msix index = 9, irq number = 129, cpu affinity mask = 00100010
hint = 00000200
msix index = 10, irq number = 130, cpu affinity mask = 00400040
hint = 00000400
msix index = 11, irq number = 131, cpu affinity mask = 00020002
hint = 00000800
msix index = 12, irq number = 132, cpu affinity mask = 00400040
hint = 00001000
msix index = 13, irq number = 133, cpu affinity mask = 00400040
hint = 00002000
msix index = 14, irq number = 134, cpu affinity mask = 00400040
hint = 00004000
msix index = 15, irq number = 135, cpu affinity mask = 00800080
hint = 00008000
msix index = 16, irq number = 136, cpu affinity mask = 00100010
hint = 00010000
msix index = 17, irq number = 137, cpu affinity mask = 00020002
hint = 00020000
msix index = 18, irq number = 138, cpu affinity mask = 00400040
hint = 00040000
msix index = 19, irq number = 139, cpu affinity mask = 00100010
hint = 00080000
msix index = 20, irq number = 140, cpu affinity mask = 00400040
hint = 00100000
msix index = 21, irq number = 141, cpu affinity mask = 00800080
hint = 00200000
msix index = 22, irq number = 142, cpu affinity mask = 00100010
hint = 00400000
msix index = 23, irq number = 143, cpu affinity mask = 00020002
hint = 00800000
msix index = 24, irq number = 144, cpu affinity mask = 00400040
hint = 01000000
msix index = 25, irq number = 145, cpu affinity mask = 00800080
hint = 02000000
msix index = 26, irq number = 146, cpu affinity mask = 00400040
hint = 04000000
msix index = 27, irq number = 147, cpu affinity mask = 00100010
hint = 08000000
msix index = 28, irq number = 148, cpu affinity mask = 00800080
hint = 10000000
msix index = 29, irq number = 149, cpu affinity mask = 00020002
hint = 20000000
msix index = 30, irq number = 150, cpu affinity mask = 00800080
hint = 40000000
msix index = 31, irq number = 151, cpu affinity mask = 00800080
hint = 80000000
When you say "Driver does not participate in sysfs enumeration" - Does it
mean "numa_node" exposure in sysfs or anything more than that ? Sorry for
basics and helping me to understand things.
` Kashyap
>
> > >
> > > What you're seeing looks like at least in part a bug with your (very
> > old)
> > > version of irqbalance. I seem to recall fixing more than a few bugs
> > dealing
> > > with affinity masks from the hint files and banned_cpu options. I
> > strongly
> > > suggest that you test with an upstream version of irqbalance and
> > > contact oracle to update their version to something more recent.
> >
> > I see CPU lock up issue does not go if <rq_affinity> is set to 1 in
> > storage stack and if <irqbalance> policy set to <ignore>. With
<ignore>
> > policy, I see only limited logic cpu of local NUMA node is busy doing
> > completion. We are still seeing may IO pumping from remote NUMA
> node.
> > This will cause CPU lockup as <rq_affinity> does not migrate softirq
> > to _exact_ submitter. Not sure what majority of h/w require from
> > <irqbalanace> ? Is it <ignore> kind of policy good choice or <subset>
?
> >
>
> I'm sorry, you'll have to try that again, I'm afraid I can't really
parse what
> you just wrote there. I _think_ what you're saying is that you're
observing
> irqbalance allowing cpu0 (or a small subset of cpus) handling interrupts
> from your storage devices. As I said in my last note, I recal there
being a
> bug about that that was fixed in a later version. I also note however,
that
> you mention above that you are using a policy script, which Im guessing
> may have some culpability in terms of you having irqs with multi-bit
affinity
> masks, which as I mentioned will not give you expected behaivor. If you
> post your policy script, I may be able to point out where you are going
> wrong.
>
> Neil
>
> > ` Kashyap
> >
> > >
> > > Regards
> > > Neil
> > >
> > > > ` Kashyap
> > > >
> > > > _______________________________________________
> > > > irqbalance mailing list
> > > > irqbalance at lists.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/irqbalance
> > > >
More information about the irqbalance
mailing list