[PATCH 0/1] irqbalance: Fix min_load to pick actual min load across all objects

Vallish Vaidyeshwara vallish at amazon.com
Tue Nov 14 11:20:33 PST 2017


We recently encountered a corner case bug in irqbalance where irqbalance
was not balancing interrupts across 2 CPU's.

The code snippet in question is:
static void gather_load_stats(struct topo_obj *obj, void *data)
{
	struct load_balance_info *info = data;
 
	if (info->min_load == 0 || obj->load < info->min_load)
		info->min_load = obj->load;
	info->total_load += obj->load;
	info->load_sources += 1;
}

The bug we encountered had 2 CPU's with load for the first CPU being 0.
Eg: obj1->load = 0 and obj2->load = 5000.

Iteration 1: obj1 is passed and info initialized to 0 is passed in
             as param.
	if (info->min_load == 0 || obj->load < info->min_load)
                info->min_load = obj->load;
Because info->min_load is 0, info->min_load is set to obj->load which
is also 0 in this case.

Iteration 2: obj2 is passed with info having values set from previous
             iteration:
	info->min_load = 0, info->total_load = 0, info->load_sources = 1

	if (info->min_load == 0 || obj->load < info->min_load)
                info->min_load = obj->load;
Because of the logic used in gather_load_stats() as shown above,
info->min_load gets set to obj2->load which is 5000. This is not the
minimum load. Because of this bug, interrupts do not migrate as this
value is checked in the function which migrates the interrupts.

Test results:
-------------
i3.large system - single socket, single core, hyperthreading enabled

With fix:
[ec2-user at ip-10-0-38-254 tmp]$ cat /proc/interrupts | grep "\<CPU0\>\|\<60\>\|\<61\>\|\<66\>\|\<67\>\|\<69\>\|\<70\>"
           CPU0       CPU1
 60:    1593819     292297  xen-pirq-msi-x     nvme0q0, nvme0q1
 61:     456537     831877  xen-pirq-msi-x     nvme0q2
 66:       1498       1644  xen-pirq-msi-x     eth1-Tx-Rx-0
 67:       1535       1034  xen-pirq-msi-x     eth1-Tx-Rx-1
 69:    3579919    8973629  xen-pirq-msi-x     eth2-Tx-Rx-0
 70:    9265764    3747096  xen-pirq-msi-x     eth2-Tx-Rx-1

Without fix:
[ec2-user at ip-10-0-10-75 ~]$ cat /proc/interrupts | grep "\<CPU0\>\|\<60\>\|\<61\>\|\<66\>\|\<67\>\|\<69\>\|\<70\>"
           CPU0       CPU1
 60:       4014    1088758  xen-pirq-msi-x     nvme0q0, nvme0q1
 61:       2046    2218741  xen-pirq-msi-x     nvme0q2
 66:         11       2978  xen-pirq-msi-x     eth1-Tx-Rx-0
 67:          8         43  xen-pirq-msi-x     eth1-Tx-Rx-1
 69:         43   13773325  xen-pirq-msi-x     eth2-Tx-Rx-0
 70:          4   13102621  xen-pirq-msi-x     eth2-Tx-Rx-1

Vallish Vaidyeshwara (1):
  irqbalance: Fix min_load to pick actual min load across all objects

 irqlist.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.7.5




More information about the irqbalance mailing list