nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Wed Jun 2 05:28:46 PDT 2021

Hi Sagi,

A correction to the below analysis:
It seems like sock->sk is NULL and not queue->sock

As part of _nvme_tcp_stop_queue
kernel_sock_shutdown and nvme_tcp_restore_sock_calls are being called:
kernel_sock_shutdown leads to nvme_tcp_state_change which will trigger err_work (nvme_tcp_error_recovery_work)

As part of nvme_tcp_error_recovery_work, nvme_tcp_free_queue is being called which releases the socket (sock_release)

In our case, based on the below bt:
nvme_tcp_error_recovery_work is being triggered (and so sock_release) before nvme_tcp_restore_sock_calls , which end up with NULL dereference pointer at 'rwlock_t sk_callback_lock' ?

Can you please review and provide your inputs for this potential race ?

Thanks
Amit Engel

-----Original Message-----
From: Engel, Amit 
Sent: Tuesday, June 1, 2021 8:52 PM
To: linux-nvme at lists.infradead.org; sagi at grimberg.me
Cc: Engel, Amit
Subject: nvme_tcp BUG: unable to handle kernel NULL pointer dereference at 0000000000000230

Hello,

We hit the below kernel panic “BUG: unable to handle kernel NULL pointer dereference at 0000000000000230”
when running with RHEL8.3 host. This happens after we reboot the target side application (when multiple ctrls/connections exist) Based on vmcore analysis it seems that when nvme_tcp_restore_sock_calls is called (from __nvme_tcp_stop_queue)
queue->sock is NULL

Are you familiar with such an issue ?

crash> bt
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255

crash> bt -l
PID: 193053  TASK: ffff9491bdad17c0  CPU: 7   COMMAND: "kworker/u193:9"
 #0 [ffffb2e9cfdbbb70] machine_kexec at ffffffffb245bf3e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/machine_kexec_64.c: 389
 #1 [ffffb2e9cfdbbbc8] __crash_kexec at ffffffffb256072d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kexec_core.c: 956
 #2 [ffffb2e9cfdbbc90] crash_kexec at ffffffffb256160d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 219
 #3 [ffffb2e9cfdbbca8] oops_end at ffffffffb2422d4d
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/kernel/dumpstack.c: 334
 #4 [ffffb2e9cfdbbcc8] no_context at ffffffffb246ba9e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/mm/fault.c: 773
 #5 [ffffb2e9cfdbbd20] do_page_fault at ffffffffb246c5c2
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
 #6 [ffffb2e9cfdbbd50] page_fault at ffffffffb2e0122e
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 1183
    [exception RIP: _raw_write_lock_bh+23]
    RIP: ffffffffb2cd6cc7  RSP: ffffb2e9cfdbbe00  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff94b2aefb4000  RCX: 0000000000000003
    RDX: 00000000000000ff  RSI: 00000000fffffe01  RDI: 0000000000000230
    RBP: ffff94923f793f40   R8: ffff9492ff1ea7f8   R9: 0000000000000000
    R10: 0000000000000000  R11: ffff9492ff1e8c64  R12: ffff94b2b7210338
    R13: 0000000000000000  R14: ffff94b27f7a4100  R15: ffff94b2b72110a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/atomic.h: 194
 #7 [ffffb2e9cfdbbe00] __nvme_tcp_stop_queue at ffffffffc02dc0aa [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1486
 #8 [ffffb2e9cfdbbe18] nvme_tcp_start_queue at ffffffffc02dcd18 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1525
 #9 [ffffb2e9cfdbbe38] nvme_tcp_setup_ctrl at ffffffffc02df258 [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1814
#10 [ffffb2e9cfdbbe80] nvme_tcp_reconnect_ctrl_work at ffffffffc02df4bf [nvme_tcp]
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/drivers/nvme/host/tcp.c: 1962
#11 [ffffb2e9cfdbbe98] process_one_work at ffffffffb24d3477
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./arch/x86/include/asm/jump_label.h: 38
#12 [ffffb2e9cfdbbed8] worker_thread at ffffffffb24d3b40
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/./include/linux/compiler.h: 193
#13 [ffffb2e9cfdbbf10] kthread at ffffffffb24d9502
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/kernel/kthread.c: 280
#14 [ffffb2e9cfdbbf50] ret_from_fork at ffffffffb2e00255
    /usr/src/debug/kernel-4.18.0-240.el8/linux-4.18.0-240.el8.x86_64/arch/x86/entry/entry_64.S: 360

nvme/host/tcp.c
..snip
1481 static void nvme_tcp_restore_sock_calls(struct nvme_tcp_queue *queue)
1482 {
1483 >-------struct socket *sock = queue->sock;
1484
1485 >-------write_lock_bh(&sock->sk->sk_callback_lock);
1486 >-------sock->sk->sk_user_data  = NULL;
1487 >-------sock->sk->sk_data_ready = queue->data_ready;
1488 >-------sock->sk->sk_state_change = queue->state_change;
1489 >-------sock->sk->sk_write_space  = queue->write_space;
1490 >-------write_unlock_bh(&sock->sk->sk_callback_lock);
1491 }
..snip

NULL pointer dereference at 0x230 → 560 decimal
crash> struct sock -o
struct sock {
   [0] struct sock_common __sk_common;
   …
   ...
   …
   [560] rwlock_t sk_callback_lock;