do_IRQ: 5.33 No irq handler for vector
jianchao.wang
jianchao.w.wang at oracle.com
Wed Jan 24 01:54:05 PST 2018
Hi Keith
Thanks for your kindly response.
On 01/24/2018 06:16 AM, Keith Busch wrote:
> On Tue, Jan 23, 2018 at 04:16:48PM +0800, jianchao.wang wrote:
>> Hi all
>>
>> I got the log following:
>> [ 446.908030] do_IRQ: 5.33 No irq handler for vector
>>
>> When did the following test:
>> loop fio job
>> size=256m
>> rw=randread
>> bs=4k
>> ioengine=libaio
>> iodepth=64
>> direct=1
>> numjobs=16
>> filename=/dev/nvme0n1
>>
>> and
>>
>> while true
>> do
>> echo 1 > /sys/block/nvme0n1/device/reset_controller
>> sleep 1
>> done
>>
>> The 33 is the vector used by nvmeq2~nvme8 (8 cpus on my machine)
>>
>> When the error log is printed out, the reset_work is sleeping at
>> nvme_dev_disable
>> ->nvme_disable_io_queues
>> -> wait_for_completion_io_timeout
>>
>> In theory, the irq should have been masked by
>> nvme_suspend_queue
>> -> pci_free_irq
>> -> __free_irq //if no other irq_action
>> -> irq_shutdown
>> -> __irq_disable
>> -> mask_irq
>> -> pci_msi_mask_irq
>>
>> Why it is still there ?
>
> The message most likely indicates there is no struct irq_desc associated
> with the vector on this CPU.
>
> Even if the device happens to emit an MSI after we call pci_free_irq, we
> haven't disabled MSI at this point, so the struct irq_desc should still
> exist, even if disabled. Now it looks like this calls stack will get to:
>
> __irq_domain_deactivate_irq
> x86_vector_deactivate
> clear_irq_vector
Yes.
pci_free_irq will both mask the irq and clear the irq vector.
pci_free_irq
-> __free_irq //if no other irq_action
-> irq_shutdown
-> __irq_disable
-> mask_irq
-> pci_msi_mask_irq
-> irq_domain_deactive_irq
-> __irq_domain_deactivate_irq
-> x86_vector_deactivate
-> clear_irq_vector
Has the MSI been sent out before mask the irq, and interrupted the cpu after
clear the vector ?
> Which sets the vectors desc to VECTOR_UNUSED, or NULL. Maybe we should
> disable the controller before freeing the irqs.
Yes, this could give the irqs chances to be handled.
And the outstanding requests will be aborted and handled in nvme_irq path.
I tested it and the log was gone.
> We free the irq's first
> because we were tying that to mean a quiesced queue, but that was before
> we had a way to quiesce blk-mq.
Thanks
Jianchao
More information about the Linux-nvme
mailing list