[GIT PULL] nvme fix for 4.16-rc6

Jens Axboe axboe at kernel.dk
Thu Mar 22 15:09:33 PDT 2018


On 3/22/18 4:02 PM, Keith Busch wrote:
> On Thu, Mar 22, 2018 at 03:32:45PM -0600, Jens Axboe wrote:
>> There seems to be some mismatch. nvme6q7 is 244:
>>
>> # cat /proc/irq/244/smp_affinity_list 
>> 49,51,53,55,57,59,61,63
>>
>> and 243 is nvme6q6:
>>
>> # cat /proc/irq/243/smp_affinity_list 
>> 17,19,21,23,41,43,45,47
>>
>> 244 has never triggered, if I do:
>>
>> # taskset -c 17 dd if=/dev/nvme6n1 of=/dev/null bs=4k iflag=direct count=1
>>
>> then look at interrupts, none of the nvme6 associated interrupts have
>> triggered.
> 
> Thanks, got it now: blk_mq_pci_map_queues() doesn't take pre_vectors into
> account, nor is there a way for it to know even about them. Some queues,
> then, get interrupt affinity assigned to "possible" CPUs that aren't
> online.
> 
> This approach will definitely need some more. Sorry for the trouble.

That brings up the question I had in the initial reply - what sort of
testing goes into patches, especially ones that are being pushed this
late in the game? My setup can't be that esoteric that nothing else
would hit this, since the issue is pretty generic.

This could have left nvme broken for tons of folks, potentially for
a final release. I know who would've gotten yelled at for that.

Improve your testing and quality control. This isn't good enough.

-- 
Jens Axboe




More information about the Linux-nvme mailing list