[PATCH] nvme-fabrics: fix crash for no IO queues

Sagi Grimberg sagi at grimberg.me
Mon Mar 15 17:08:58 GMT 2021


>>> A crash happens when set feature(NVME_FEAT_NUM_QUEUES) timeout in nvme
>>> over rdma(roce) reconnection, the reason is use the queue which is not
>>> alloced.
>>>
>>> If queue is not live, should not allow queue request.
>>
>> Can you describe exactly the scenario here? What is the state
>> here? LIVE? or DELETING?
> If seting feature(NVME_FEAT_NUM_QUEUES) failed due to time out or
> the target return 0 io queues, nvme_set_queue_count will return 0,
> and then reconnection will continue and success. The state of controller
> is LIVE. The request will continue to deliver by call ->queue_rq(),
> and then crash happens.

Thinking about this again, we should absolutely fail the reconnection
when we are unable to set any I/O queues, it is just wrong to
keep this controller alive...

This should be fixed for both rdma and tcp.



More information about the Linux-nvme mailing list