NVMe and IRQ Affinity, another problem
Young Yu
young.yu at northwestern.edu
Wed Apr 4 17:28:05 PDT 2018
Hello,
I know that this is another run on the old topic, but I'm still
wondering what is the right way to bind irq of NVMe-pci devices to the
cores in local NUMA node. I'm using kernel 4.16.0-1.el7 on CentOS 7.4
and the machine have 2 numa nodes as in
$ lscpu|grep NUMA
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
I have 16 NVMe devices, 8 per NUMA node, nvme0 to 7 to the NUMA 0 and
8 to 15 to NUMA 1. irqbalance was on by default. The irq of these
devices are all bound to the core 0 and 1 regardless of where they are
physically attached. affinity_hint looks still invalid, however there
is an effective_affinity that matches with some interrupt
bounded. cpu_list on mq was pointed to the wrong cores on the NVMe
devices on NUMA 1. I read it was fixed in kernel 4.3 so not sure
whether I’m looking at it in a right way.
Eventually I’d like to know if there is a way to distribute irq of
each nvme devices to different local cores in NUMA they are attached
to.
e.g. nvme0 - cpu 0
nvme1 - cpu 2
...
nvme8 - cpu 1
nvme9 - cpu 3
...
Here are the output below.
$ cat /sys/block/nvme0n1/device/device/numa_node
0
$ cat /sys/block/nvme8n1/device/device/numa_node
1
$ cat /proc/interrupts |grep nvme0q
$ 143: 1777 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331648-edge nvme0q0, nvme0q1
152: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331649-edge nvme0q2
157: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331650-edge nvme0q3
160: 0 12773 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331651-edge nvme0q4
161: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331652-edge nvme0q5
162: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331653-edge nvme0q6
163: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331654-edge nvme0q7
$ cat /proc/interrupts |grep nvme8q
51: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827462-edge nvme8q7
54: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827457-edge nvme8q2
65: 13931 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827456-edge nvme8q0, nvme8q1
76: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827458-edge nvme8q3
87: 0 13380 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
PCI-MSI 71827459-edge nvme8q4
102: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827460-edge nvme8q5
117: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827461-edge nvme8q6
$ for i in $(grep nvme0q /proc/interrupts | cut -d":" -f1 | sed "s/
//g"); do echo "IRQ: $i"; echo -n "HINT: " && cat
/proc/irq/$i/affinity_hint; echo -n "SMP: " && cat
/proc/irq/$i/smp_affinity && echo -n "EFF: " && cat
/proc/irq/$i/effective_affinity; done
IRQ: 143
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00055555,55555555,55555555
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 152
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000015,55555555,55555555,55500000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 157
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 555555,55555555,55555540,00000000,00000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 160
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,2aaaaaaa,aaaaaaaa
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
IRQ: 161
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,0aaaaaaa,aaaaaaaa,80000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 162
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,02aaaaaa,aaaaaaaa,a0000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 163
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: aaaaaa,aaaaaaaa,a8000000,00000000,00000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
$ for i in $(grep nvme8q /proc/interrupts | cut -d":" -f1 | sed "s/
//g"); do echo "IRQ: $i"; echo -n "HINT: " && cat
/proc/irq/$i/affinity_hint; echo -n "SMP: " && cat
/proc/irq/$i/smp_affinity && echo -n "EFF: " && cat
/proc/irq/$i/effective_affinity; done
IRQ: 51
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: aaaaaa,aaaaaaaa,a8000000,00000000,00000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 54
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000015,55555555,55555555,55500000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 65
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00055555,55555555,55555555
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 76
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 555555,55555555,55555540,00000000,00000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 87
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,2aaaaaaa,aaaaaaaa
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
IRQ: 102
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,0aaaaaaa,aaaaaaaa,80000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 117
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,02aaaaaa,aaaaaaaa,a0000000,00000000,00000000,00000000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
$ cat /sys/block/nvme8n1/mq/*/cpu_list
0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,
38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70,
72, 74, 76, 78, 80, 82
84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112,
114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140,
142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164
166, 168, 170, 172, 174, 176, 178, 180, 182, 184, 186, 188, 190, 192,
194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220,
222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246
1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,
39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61
63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95,
97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123
125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151,
153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179,
181, 183, 185
187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213,
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241,
243, 245, 247
$ echo 000000,00000000,00000000,00000000,00000000,000aaaaa,aaaaaaaa,aaaaaaaa
> /proc/irq/143/smp_affinity
bash: echo: write error: Input/output error
After I was looking at this, I have built 4.13.16 kernel myself from
the source, and try to see if there is any difference to the one that
is from ELRepo. However, the hint was still invalid and interrupts are
bound to the core in different NUMA although they are more
distributed. I was not able to manually fix the smp_affinity in both
kernels.
$ cat /proc/interrupts |grep nvme0q
62: 3687 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331648-edge nvme0q0, nvme0q1
123: 0 0 0 0 0 0 0 0 6642 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331649-edge nvme0q2
129: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 335 0 0 0 0 0 0 0 PCI-MSI
50331650-edge nvme0q3
142: 0 6426 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331651-edge nvme0q4
155: 0 0 0 0 0 0 0 4842 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331652-edge nvme0q5
167: 0 0 0 0 0 0 0 0 0 0 0 0 0 2895 0 0 0 0 0 0 0 0 0 0 PCI-MSI
50331653-edge nvme0q6
179: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3063 0 0 0 0 PCI-MSI
50331654-edge nvme0q7
$ cat /proc/interrupts |grep nvme8q
134: 21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827456-edge nvme8q0, nvme8q1
147: 0 0 0 0 0 0 0 0 1102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827457-edge nvme8q2
160: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 950 0 0 0 0 0 0 0 PCI-MSI
71827458-edge nvme8q3
172: 0 468 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827459-edge nvme8q4
181: 0 0 0 0 0 0 0 889 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827460-edge nvme8q5
187: 0 0 0 0 0 0 0 0 0 0 0 0 0 552 0 0 0 0 0 0 0 0 0 0 PCI-MSI
71827461-edge nvme8q6
191: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 470 0 0 0 0 PCI-MSI
71827462-edge nvme8q7
$ for i in $(grep nvme0q /proc/interrupts | cut -d":" -f1 | sed "s/
//g"); do echo "IRQ: $i"; echo -n "HINT: " && cat
/proc/irq/$i/affinity_hint; echo -n "SMP: " && cat
/proc/irq/$i/smp_affinity && echo -n "EFF: " && cat
/proc/irq/$i/effective_affinity; done
IRQ: 62
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000055
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 123
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00005500
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
IRQ: 129
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00550000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
IRQ: 142
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,0000002a
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
IRQ: 155
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000a80
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
IRQ: 167
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,0002a000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
IRQ: 179
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00a80000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
$ for i in $(grep nvme8q /proc/interrupts | cut -d":" -f1 | sed "s/
//g"); do echo "IRQ: $i"; echo -n "HINT: " && cat
/proc/irq/$i/affinity_hint; echo -n "SMP: " && cat
/proc/irq/$i/smp_affinity && echo -n "EFF: " && cat
/proc/irq/$i/effective_affinity; done
IRQ: 134
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000055
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
IRQ: 147
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00005500
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000100
IRQ: 160
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00550000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00010000
IRQ: 172
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,0000002a
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000002
IRQ: 181
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000a80
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000080
IRQ: 187
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,0002a000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00002000
IRQ: 191
HINT: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000
SMP: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00a80000
EFF: 000000,00000000,00000000,00000000,00000000,00000000,00000000,00080000
$ cat /sys/block/nvme8n1/mq/*/cpu_list
0, 2, 4, 6
8, 10, 12, 14
16, 18, 20, 22
1, 3, 5
7, 9, 11
13, 15, 17
19, 21, 23
# echo 000000,00000000,00000000,00000000,00000000,00000000,00000000,000000aa
> /proc/irq/134/smp_affinity
bash: echo: write error: Input/output error
I have tried the manual config on one of the other machine we have,
but I still have the same problem except the kernel 4.4 where I can
manually set the smp_affinity. With the same hardware setup, I cannot
get it to work on the kernel 4.11 and still get the same Input/output
error.
Se-young Yu
Northwestern University
More information about the Linux-nvme
mailing list