Oops when completing request on the wrong queue
Gabriel Krisman Bertazi
krisman at linux.vnet.ibm.com
Thu Aug 11 11:10:35 PDT 2016
Keith Busch <keith.busch at intel.com> writes:
> On Wed, Aug 10, 2016 at 01:04:35AM -0300, Gabriel Krisman Bertazi wrote:
>> Hi,
>>
>> We, IBM, have been experiencing eventual Oops when stressing IO at the
>> same time we add/remove processors. The Oops happens in the IRQ path,
>> when we try to complete a request that was apparently meant for another
>> queue.
>>
>> In __nvme_process_cq, the driver will use the cqe.command_id and the
>> nvmeq->tags to find out, via blk_mq_tag_to_rq, the request that
>> initiated the IO. Eventually, it happens that the request returned by
>> that function is not initialized, and we crash inside
>> __blk_mq_complete_request, as shown below.
>
> Could you try the following patch and see if it resolves the issue?
Hi Keith,
Thanks for your response. I had tried this exact change already on 4.7
with no effect. Do you think doing it on 4.8-rc1 will yield better
results?
I also verified that the iod, when in __nvme_process_cq, points to the same
queue that queued the command, as expected, but in nvme_timeout,
according to the log I sent earlier, it is pointing to a different nvmeq
(different nvmeq->qid). This is very strange to me.
--
Gabriel Krisman Bertazi
More information about the Linux-nvme
mailing list