BUG at IP: blk_mq_get_request+0x23e/0x390 on 4.16.0-rc7

Sun Apr 8 04:53:03 PDT 2018

>>>>>> Hi Sagi
>>>>>>
>>>>>> Still can reproduce this issue with the change:
>>>>>
>>>>> Thanks for validating Yi,
>>>>>
>>>>> Would it be possible to test the following:
>>>>> --
>>>>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>>>>> index 75336848f7a7..81ced3096433 100644
>>>>> --- a/block/blk-mq.c
>>>>> +++ b/block/blk-mq.c
>>>>> @@ -444,6 +444,10 @@ struct request *blk_mq_alloc_request_hctx(struct
>>>>> request_queue *q,
>>>>>                   return ERR_PTR(-EXDEV);
>>>>>           }
>>>>>           cpu = cpumask_first_and(alloc_data.hctx->cpumask, cpu_online_mask);
>>>>> +       if (cpu >= nr_cpu_ids) {
>>>>> +               pr_warn("no online cpu for hctx %d\n", hctx_idx);
>>>>> +               cpu = cpumask_first(alloc_data.hctx->cpumask);
>>>>> +       }
>>>>>           alloc_data.ctx = __blk_mq_get_ctx(q, cpu);
>>>>>
>>>>>           rq = blk_mq_get_request(q, NULL, op, &alloc_data);
>>>>> --
>>>>> ...
>>>>>
>>>>>
>>>>>> [  153.384977] BUG: unable to handle kernel paging request at
>>>>>> 00003a9ed053bd48
>>>>>> [  153.393197] IP: blk_mq_get_request+0x23e/0x390
>>>>>
>>>>> Also would it be possible to provide gdb output of:
>>>>>
>>>>> l *(blk_mq_get_request+0x23e)
>>>>
>>>> nvmf_connect_io_queue() is used in this way by asking blk-mq to allocate
>>>> request from one specific hw queue, but there may not be all online CPUs
>>>> mapped to this hw queue.
>>
>> Yes, this is what I suspect..
>>
>>> And the following patchset may fail this kind of allocation and avoid
>>> the kernel oops.
>>>
>>> 	https://marc.info/?l=linux-block&m=152318091025252&w=2
>>
>> Thanks Ming,
>>
>> But I don't want to fail the allocation, nvmf_connect_io_queue simply
>> needs a tag to issue the connect request, I much rather to take this
>> tag from an online cpu than failing it... We use this because we reserve
> 
> The failure is only triggered when there isn't any online CPU mapped to
> this hctx, so do you want to wait for CPUs for this hctx becoming online?

I was thinking of allocating a tag from that hctx even if it had no
online cpu, the execution is done on an online cpu (hence the call
to blk_mq_alloc_request_hctx).

> Or I may understand you wrong, :-)

In the report we connected 40 hctxs (which was exactly the number of
online cpus), after Yi removed 3 cpus, we tried to connect 37 hctxs.
I'm not sure why some hctxs are left without any online cpus.

This seems to be related to the queue mapping.

Lets say I have 4-cpu system and my device always allocates
num_online_cpus() hctxs.

at first I get:
cpu0 -> hctx0
cpu1 -> hctx1
cpu2 -> hctx2
cpu3 -> hctx3

When cpu1 goes offline I think the new mapping will be:
cpu0 -> hctx0
cpu1 -> hctx0 (from cpu_to_queue_index) // offline
cpu2 -> hctx2
cpu3 -> hctx0 (from cpu_to_queue_index)

This means that now hctx1 is unmapped. I guess we can fix nvmf code
to not connect it. But we end up with less queues than cpus without
any good reason.

I would have optimally want a different mapping that will use all
the queues:
cpu0 -> hctx0
cpu2 -> hctx1
cpu3 -> hctx2
* cpu1 -> hctx1 (doesn't matter, offline)

Something looks broken...