nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Sagi Grimberg sagi at grimberg.me
Tue Jun 8 16:39:21 PDT 2021


> Hi Sagi,
> 
> A correction to the below analysis:
> It seems like sock->sk is NULL and not queue->sock
> 
> As part of _nvme_tcp_stop_queue
> kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called:
> kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work)
> 
> As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release)
> 
> In our case, based on the below bt:
> nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ?
> 
> Can you please review and provide your inputs for this potential race ?

Seems that RH8.3 is missing the mutex protection on nvme_tcp_stop_queue.
I'm assuming it doesn't happen upstream?



More information about the Linux-nvme mailing list