target crash / host hang with nvme-all.3 branch of nvme-fabrics

Sagi Grimberg sagi at grimberg.me
Thu Jun 16 14:40:02 PDT 2016


>>> How do we rely on that? __nvmet_rdma_queue_disconnect callers are
>>> responsible for queue_list deletion and queue the release. I don't
>>> see where are we getting it wrong.
>>
>> Thread 1:
>>
>> Moves the queues off nvmet_rdma_queue_list and and onto the
>> local list in nvmet_rdma_delete_ctrl
>>
>> Thread 2:
>>
>> Gets into nvmet_rdma_cm_handler -> nvmet_rdma_queue_disconnect for one
>> of the queues now on the local list.  list_empty(&queue->queue_list)
>> evaluates
>> to false because the queue is on the local list, and now we have thread 1
>> and 2 racing for disconnecting the queue.
>
> But the list removal and list_empty evaluation is still under a mutex,
> isn't that sufficient to avoid the race?

And we also have a mutual exclusion point inside 
__nvmet_rdma_queue_disconnect with queue->state_lock...



More information about the Linux-nvme mailing list