BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

Sagi Grimberg sagi at grimberg.me
Sun Apr 8 03:58:49 PDT 2018


>>>> Hi Sagi
>>>>
>>>> Still can reproduce this issue with the change:
>>>
>>> Thanks for validating Yi,
>>>
>>> Would it be possible to test the following:
>>> --
>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>> index 75336848f7a7..81ced3096433 100644
>>> --- a/block/blk-mq.c
>>> +++ b/block/blk-mq.c
>>> @@ -444,6 +444,10 @@ struct request *blk_mq_alloc_request_hctx(struct
>>> request_queue *q,
>>>                  return ERR_PTR(-EXDEV);
>>>          }
>>>          cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
>>> +       if (cpu >= nr_cpu_ids) {
>>> +               pr_warn("no online cpu for hctx %d\n", hctx_idx);
>>> +               cpu = cpumask_first(alloc_data.hctx->cpumask);
>>> +       }
>>>          alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
>>>
>>>          rq = blk_mq_get_request(q, NULL, op, &alloc_data);
>>> --
>>> ...
>>>
>>>
>>>> [  153.384977] BUG: unable to handle kernel paging request at
>>>> 00003a9ed053bd48
>>>> [  153.393197] IP: blk_mq_get_request+0x23e/0x390
>>>
>>> Also would it be possible to provide gdb output of:
>>>
>>> l *(blk_mq_get_request+0x23e)
>>
>> nvmf_connect_io_queue() is used in this way by asking blk-mq to allocate
>> request from one specific hw queue, but there may not be all online CPUs
>> mapped to this hw queue.

Yes, this is what I suspect..

> And the following patchset may fail this kind of allocation and avoid
> the kernel oops.
> 
> 	https://marc.info/?l=linux-block&m=152318091025252&w=2

Thanks Ming,

But I don't want to fail the allocation, nvmf_connect_io_queue simply
needs a tag to issue the connect request, I much rather to take this
tag from an online cpu than failing it... We use this because we reserve
a tag per-queue for this, but in this case, I'd rather block until the
inflight tag complete than failing the connect.



More information about the Linux-nvme mailing list