target crash / host hang with nvme-all.3 branch of nvme-fabrics
Christoph Hellwig
hch at lst.de
Tue Jun 21 09:01:34 PDT 2016
On Fri, Jun 17, 2016 at 12:37:18AM +0300, Sagi Grimberg wrote:
>> to false because the queue is on the local list, and now we have thread 1
>> and 2 racing for disconnecting the queue.
>
> But the list removal and list_empty evaluation is still under a mutex,
> isn't that sufficient to avoid the race?
If only once side takes the lock it's not very helpful. We can
execute nvmet_rdma_queue_disconnect from the CM handler at the
same time that the queue is on the to be removed list, which creates
two issues: a) we manipulate local del_list without any knowledge
of the thread calling nvmet_rdma_delete_ctrl, leading to potential
list corruption, and b) we can call into __nvmet_rdma_queue_disconnect
concurrently. As you pointed out we still have the per-queue
state_lock inside __nvmet_rdma_queue_disconnect so b) probably
is harmless at the moment as long as the queue hasn't been freed
yet by one of the racing threads, which is fairly unlikely.
Either way - using list_empty to check if something is still alive due
to being linked in a list and using a local dispose list simply don't
mix. Both are useful patterns on their own, but should not be mixed.
More information about the Linux-nvme
mailing list