nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Wed Jun 9 02:11:09 PDT 2021

> Im not sure that using the queue_lock mutex ill help
> The race in this case is between sock_release and nvme_tcp_restore_sock_calls
> sock_release is being called as part of nvme_tcp_free_queue which is destroying the mutex

Maybe I'm not understanding the issue here. What is the scenario again?
stop_queue is called (ctx1), that triggers error_recovery (ctx2) which
then calls free_queue before ctx1 gets to restore sock callbacks?

err_work will first stop the queues before freeing them, so it will
serialize behind ctx1. What am I missing?