nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Sun Jun 13 01:35:56 PDT 2021

Hi Sagi, after revisit the upstream code, I agree with your analysis.
We are trying now to run rhel8.3 nvme host with the missing mutex_lock commit.

Anyway, I applied a patch that uses the same mutex_lock also for start_queue failure case.

In this case, nvme_tcp_start_queue calls __nvme_tcp_stop_queue that should be protected by the same mutex_lock.

Thank you for your help
Amit

-----Original Message-----
From: Sagi Grimberg <sagi at grimberg.me> 
Sent: Thursday, June 10, 2021 11:03 PM
To: Engel, Amit; linux-nvme at lists.infradead.org
Cc: Anner, Ran; Grupi, Elad
Subject: Re: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

[EXTERNAL EMAIL] 

> Correct, free_queue is being called (sock->sk becomes NULL) before 
> restore_sock_calls
> 
> When restore_sock_calls is called, we fail on 'write_lock_bh(&sock->sk->sk_callback_lock)'
> 
> NULL pointer dereference at 0x230 → 560 decimal
> crash> struct sock -o
> struct sock {
>     [0] struct sock_common __sk_common;
>     …
>     ...
>     …
>     [560] rwlock_t sk_callback_lock;
> 
> stop queue in ctx2 does not really do anything since 'NVME_TCP_Q_LIVE' bit is already cleared (by ctx1).
> can you please explain how stop the queue before free helps to serialize ctx1 ?

What I understood from your description is:
1. ctx1 calls stop_queue - calls kernel_sock_shutdown 2. ctx1 gets to restore_sock_calls (just before) 3. ctx2 is triggered from state_change - scheduling err_work 4. ctx2 does stop_queues 5. ctx2 calls destroy_queues -> there does sock_release 6. ctx1 does frwd progress and access an already freed sk

Hence with the mutex protection, ctx2 will be serialized on step (4) until ctx2 releases the mutex and hence cannot get to step (5) but only after ctx1 releases the mutex, in step (6).

But maybe I'm not interpreting this correctly?