[GIT PULL] nvme fix for 4.16-rc6
Jens Axboe
axboe at kernel.dk
Thu Mar 22 14:32:45 PDT 2018
On 3/22/18 3:09 PM, Keith Busch wrote:
> On Wed, Mar 21, 2018 at 03:44:32PM -0600, Jens Axboe wrote:
>>
>> [ 30.241598] nvme nvme2: pci function 0000:0b:00.0
>> [ 30.247205] nvme nvme3: pci function 0000:81:00.0
>> [ 30.252684] nvme nvme4: pci function 0000:82:00.0
>> [ 30.258144] nvme nvme5: pci function 0000:83:00.0
>> [ 30.263606] nvme nvme6: pci function 0000:84:00.0
>> [ 30.360555] nvme nvme3: could not set timestamp (8194)
>> [ 30.481649] nvme nvme6: Shutdown timeout set to 8 seconds
>> [ 38.790949] nvme nvme4: Device not ready; aborting initialisation
>> [ 38.797857] nvme nvme4: Removing after probe failure status: -19
>> [ 60.708816] nvme nvme3: I/O 363 QID 8 timeout, completion polled
>> [ 60.708820] nvme nvme6: I/O 781 QID 7 timeout, completion polled
>> [ 68.068772] nvme nvme2: I/O 769 QID 28 timeout, completion polled
>> [ 91.108626] nvme nvme6: I/O 781 QID 7 timeout, completion polled
>> [ 98.660581] nvme nvme2: I/O 769 QID 28 timeout, completion polled
>> [ 121.702691] nvme nvme6: I/O 100 QID 7 timeout, completion polled
>> [ 128.998648] nvme nvme3: I/O 387 QID 4 timeout, completion polled
>> [ 152.038523] nvme nvme6: I/O 781 QID 7 timeout, completion polled
>>
>> This is just doing an fdisk -l after load. No interrupts triggering,
>> looking at /proc/interrupts for the queues that timeout.
>
> So no interrupts triggered for the queues that timeout, but you are
> getting interrupts for other queues. Are there by chance many spurious
> interrupts for those other queues?
I picked one device, and I did:
for i in $(seq 0 47); do echo cpu $i; taskset -c $i dd if=/dev/nvme6n1 of=/dev/null bs=4k iflag=direct count=1; done
[...]
cpu 17
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 30.608 s, 0.1 kB/s
same slowness for CPU 19, 21, 23, 41, 43, 45, 47. Looking at dmesg:
[ 701.725191] nvme nvme6: I/O 912 QID 7 timeout, completion polled
[ 732.124646] nvme nvme6: I/O 896 QID 7 timeout, completion polled
[ 762.652479] nvme nvme6: I/O 840 QID 7 timeout, completion polled
[ 793.052259] nvme nvme6: I/O 754 QID 7 timeout, completion polled
[ 823.644082] nvme nvme6: I/O 339 QID 7 timeout, completion polled
[ 853.979878] nvme nvme6: I/O 86 QID 7 timeout, completion polled
[ 884.571686] nvme nvme6: I/O 997 QID 7 timeout, completion polled
[ 914.907506] nvme nvme6: I/O 694 QID 7 timeout, completion polled
we see them all timing out after 30s, and all being on QID 7, which
is hctx6. There are 8 hw queues (and interrupts), only some of
them seem to be actually triggering looking at /proc/interrupts.
There seems to be some mismatch. nvme6q7 is 244:
# cat /proc/irq/244/smp_affinity_list
49,51,53,55,57,59,61,63
and 243 is nvme6q6:
# cat /proc/irq/243/smp_affinity_list
17,19,21,23,41,43,45,47
244 has never triggered, if I do:
# taskset -c 17 dd if=/dev/nvme6n1 of=/dev/null bs=4k iflag=direct count=1
then look at interrupts, none of the nvme6 associated interrupts have
triggered.
--
Jens Axboe
More information about the Linux-nvme
mailing list