kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012

Chaitanya Kulkarni Chaitanya.Kulkarni at wdc.com
Sun Feb 7 00:48:22 EST 2021


On 2/6/21 20:51, Yi Zhang wrote:
> The issue was introduced after merge the NVMe updates
>
> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
> Merge: 44d10e4b2f2c 0d7389718c32
> Author: Jens Axboe <axboe at kernel.dk>
> Date:   Tue Feb 2 07:12:06 2021 -0700
>
>      Merge branch 'for-5.12/drivers' into for-next
>
>      * for-5.12/drivers: (22 commits)
>        nvme-tcp: use cancel tagset helper for tear down
>        nvme-rdma: use cancel tagset helper for tear down
>        nvme-tcp: add clean action for failed reconnection
>        nvme-rdma: add clean action for failed reconnection
>        nvme-core: add cancel tagset helpers
>        nvme-core: get rid of the extra space
>        nvme: add tracing of zns commands
>        nvme: parse format nvm command details when tracing
>        nvme: update enumerations for status codes
>        nvmet: add lba to sect conversion helpers
>        nvmet: remove extra variable in identify ns
>        nvmet: remove extra variable in id-desclist
>        nvmet: remove extra variable in smart log nsid
>        nvme: refactor ns->ctrl by request
>        nvme-tcp: pass multipage bvec to request iov_iter
>        nvme-tcp: get rid of unused helper function
>        nvme-tcp: fix wrong setting of request iov_iter
>        nvme: support command retry delay for admin command
>        nvme: constify static attribute_group structs
>        nvmet-fc: use RCU proctection for assoc_list
>
>
> On 2/6/21 11:08 AM, Yi Zhang wrote:
>> blktests nvme-tcp/012
>
Running the test few times I got this once no sign of Oops though  :-


# nvme_trtype=tcp ./check nvme/012
nvme/012 (run mkfs and data verification fio job on NVMeOF block
device-backed ns) [failed]
    runtime  37.624s  ...  42.272s
    something found in dmesg:
    [   69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
    [   69.330277] loop: module loaded
    [   69.333383] loop0: detected capacity change from 2097152 to 0
    [   69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
    [   69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
    [   69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
    [   69.399482] nvme nvme1: creating 64 I/O queues.
    [   69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
    [   69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1",
addr 127.0.0.1:4420
    [   72.381282] XFS (nvme1n1): Mounting V5 Filesystem
    ...
    (See '/root/blktests/results/nodev/nvme/012.dmesg' for the entire
message)
blktests (master) # dmesg  -c
[   38.347661] nvme nvme0: 63/0/0 default/read/poll queues
[   69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
[   69.330277] loop: module loaded
[   69.333383] loop0: detected capacity change from 2097152 to 0
[   69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
[   69.399482] nvme nvme1: creating 64 I/O queues.
[   69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
[   69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420
[   72.381282] XFS (nvme1n1): Mounting V5 Filesystem
[   72.394603] XFS (nvme1n1): Ending clean mount
[   72.395463] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)

[   73.402838] ======================================================
[   73.403491] WARNING: possible circular locking dependency detected
[   73.404125] 5.11.0-rc6blk+ #166 Tainted: G           OE   
[   73.404690] ------------------------------------------------------
[   73.405332] fio/3671 is trying to acquire lock:
[   73.405803] ffff88881d110cb0 (sk_lock-AF_INET){+.+.}-{0:0}, at:
tcp_sendpage+0x23/0x50
[   73.406627]
but task is already holding lock:
[   73.407246] ffff888128cabcf0 (&xfs_dir_ilock_class/5){+.+.}-{3:3},
at: xfs_ilock+0xbf/0x250 [xfs]
[   73.408233]
which lock already depends on the new lock.

[   73.409096]
the existing dependency chain (in reverse order) is:
[   73.409885]
-> #3 (&xfs_dir_ilock_class/5){+.+.}-{3:3}:
[   73.410594]        lock_acquire+0xd2/0x390
[   73.411045]        down_write_nested+0x47/0x110
[   73.411530]        xfs_ilock+0xbf/0x250 [xfs]
[   73.412068]        xfs_create+0x1d9/0x6b0 [xfs]
[   73.412626]        xfs_generic_create+0x205/0x2c0 [xfs]
[   73.413255]        lookup_open+0x4f6/0x630
[   73.413710]        path_openat+0x298/0xa80
[   73.414158]        do_filp_open+0x93/0x100
[   73.414606]        do_sys_openat2+0x24b/0x310
[   73.415090]        do_sys_open+0x4b/0x80
[   73.415569]        do_syscall_64+0x33/0x40
[   73.416075]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.416747]
-> #2 (sb_internal){.+.+}-{0:0}:
[   73.417436]        lock_acquire+0xd2/0x390
[   73.417938]        xfs_trans_alloc+0x19e/0x2e0 [xfs]
[   73.418607]        xfs_vn_update_time+0xc8/0x250 [xfs]
[   73.419295]        touch_atime+0x16b/0x200
[   73.419814]        xfs_file_mmap+0xa7/0xb0 [xfs]
[   73.420447]        mmap_region+0x3f6/0x690
[   73.420955]        do_mmap+0x379/0x580
[   73.421415]        vm_mmap_pgoff+0xdf/0x170
[   73.421929]        ksys_mmap_pgoff+0x1dd/0x240
[   73.422477]        do_syscall_64+0x33/0x40
[   73.422980]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.423672]
-> #1 (&mm->mmap_lock#2){++++}-{3:3}:
[   73.424426]        lock_acquire+0xd2/0x390
[   73.424932]        __might_fault+0x5e/0x80
[   73.425433]        _copy_from_user+0x1e/0xa0
[   73.425966]        do_ip_setsockopt.isra.15+0xbba/0x12f0
[   73.426616]        ip_setsockopt+0x34/0x90
[   73.427120]        __sys_setsockopt+0x8f/0x110
[   73.427681]        __x64_sys_setsockopt+0x20/0x30
[   73.428252]        do_syscall_64+0x33/0x40
[   73.428751]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.429422]
-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
[   73.430154]        validate_chain+0x8fa/0xec0
[   73.430681]        __lock_acquire+0x563/0x940
[   73.431212]        lock_acquire+0xd2/0x390
[   73.431719]        lock_sock_nested+0x72/0xa0
[   73.432252]        tcp_sendpage+0x23/0x50
[   73.432740]        inet_sendpage+0x4f/0x80
[   73.433240]        kernel_sendpage+0x57/0xc0
[   73.433760]        nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[   73.434435]        nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[   73.435104]        __blk_mq_try_issue_directly+0x109/0x1b0
[   73.435768]        blk_mq_request_issue_directly+0x4b/0x80
[   73.436433]        blk_mq_try_issue_list_directly+0x62/0xf0
[   73.437104]        blk_mq_sched_insert_requests+0x192/0x2b0
[   73.437774]        blk_mq_flush_plug_list+0x13c/0x260
[   73.438389]        blk_flush_plug_list+0xd7/0x100
[   73.438960]        blk_finish_plug+0x21/0x30
[   73.439485]        _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[   73.440171]        __xfs_buf_submit+0x85/0x220 [xfs]
[   73.440833]        xfs_buf_read_map+0x105/0x2a0 [xfs]
[   73.441511]        xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[   73.442254]        xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.442874]        xfs_ialloc_read_agi+0x48/0x170 [xfs]
[   73.443568]        xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[   73.444277]        xfs_dir_ialloc+0x72/0x630 [xfs]
[   73.444927]        xfs_create+0x241/0x6b0 [xfs]
[   73.445550]        xfs_generic_create+0x205/0x2c0 [xfs]
[   73.446255]        lookup_open+0x4f6/0x630
[   73.446759]        path_openat+0x298/0xa80
[   73.447268]        do_filp_open+0x93/0x100
[   73.447770]        do_sys_openat2+0x24b/0x310
[   73.448301]        do_sys_open+0x4b/0x80
[   73.448779]        do_syscall_64+0x33/0x40
[   73.449275]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.449944]
other info that might help us debug this:

[   73.450899] Chain exists of:
  sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

[   73.452222]  Possible unsafe locking scenario:

[   73.452931]        CPU0                    CPU1
[   73.453479]        ----                    ----
[   73.454024]   lock(&xfs_dir_ilock_class/5);
[   73.454528]                                lock(sb_internal);
[   73.455217]                                lock(&xfs_dir_ilock_class/5);
[   73.456027]   lock(sk_lock-AF_INET);
[   73.456463]
 *** DEADLOCK ***

[   73.457175] 6 locks held by fio/3671:
[   73.457618]  #0: ffff8881185a9488 (sb_writers#9){.+.+}-{0:0}, at:
path_openat+0xa61/0xa80
[   73.458604]  #1: ffff888128cabfe0
(&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at:
path_openat+0x287/0xa80
[   73.459847]  #2: ffff8881185a96a8 (sb_internal){.+.+}-{0:0}, at:
xfs_create+0x1b4/0x6b0 [xfs]
[   73.460926]  #3: ffff888128cabcf0
(&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xbf/0x250 [xfs]
[   73.462100]  #4: ffff88881bd54c58 (hctx->srcu){....}-{0:0}, at:
hctx_lock+0x62/0xe0
[   73.463029]  #5: ffff888142ab29d0 (&queue->send_mutex){+.+.}-{3:3},
at: nvme_tcp_queue_rq+0x2fc/0x330 [nvme_tcp]
[   73.464288]
stack backtrace:
[   73.464824] CPU: 16 PID: 3671 Comm: fio Tainted: G           OE    
5.11.0-rc6blk+ #166
[   73.465785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[   73.467176] Call Trace:
[   73.467483]  dump_stack+0x8d/0xb5
[   73.467910]  check_noncircular+0x119/0x130
[   73.468413]  ? save_trace+0x3d/0x2c0
[   73.468854]  validate_chain+0x8fa/0xec0
[   73.469329]  __lock_acquire+0x563/0x940
[   73.469800]  lock_acquire+0xd2/0x390
[   73.470240]  ? tcp_sendpage+0x23/0x50
[   73.470689]  lock_sock_nested+0x72/0xa0
[   73.471163]  ? tcp_sendpage+0x23/0x50
[   73.471624]  tcp_sendpage+0x23/0x50
[   73.472059]  inet_sendpage+0x4f/0x80
[   73.472504]  kernel_sendpage+0x57/0xc0
[   73.472965]  ? mark_held_locks+0x4f/0x70
[   73.473445]  nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[   73.474062]  nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[   73.474697]  __blk_mq_try_issue_directly+0x109/0x1b0
[   73.475324]  blk_mq_request_issue_directly+0x4b/0x80
[   73.475953]  blk_mq_try_issue_list_directly+0x62/0xf0
[   73.476587]  blk_mq_sched_insert_requests+0x192/0x2b0
[   73.477225]  blk_mq_flush_plug_list+0x13c/0x260
[   73.477802]  blk_flush_plug_list+0xd7/0x100
[   73.478338]  blk_finish_plug+0x21/0x30
[   73.478818]  _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[   73.479457]  __xfs_buf_submit+0x85/0x220 [xfs]
[   73.480091]  xfs_buf_read_map+0x105/0x2a0 [xfs]
[   73.480728]  ? xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.481331]  xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[   73.482036]  ? xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.482635]  xfs_read_agi+0xb4/0x1d0 [xfs]
[   73.483221]  xfs_ialloc_read_agi+0x48/0x170 [xfs]
[   73.483866]  xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[   73.484513]  xfs_dir_ialloc+0x72/0x630 [xfs]
[   73.485092]  ? xfs_ilock+0xbf/0x250 [xfs]
[   73.485648]  xfs_create+0x241/0x6b0 [xfs]
[   73.486198]  xfs_generic_create+0x205/0x2c0 [xfs]
[   73.486857]  lookup_open+0x4f6/0x630
[   73.487317]  path_openat+0x298/0xa80
[   73.487786]  ? __lock_acquire+0x581/0x940
[   73.488302]  do_filp_open+0x93/0x100
[   73.488769]  ? do_raw_spin_unlock+0x49/0xc0
[   73.489307]  ? _raw_spin_unlock+0x1f/0x30
[   73.489825]  do_sys_openat2+0x24b/0x310
[   73.490321]  do_sys_open+0x4b/0x80
[   73.490764]  do_syscall_64+0x33/0x40
[   73.491230]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   73.491890] RIP: 0033:0x7f7d6f2eaead
[   73.492353] Code: c5 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 7e f4 ff ff 48 89 04 24 b8 02 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 c7 f4 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[   73.494713] RSP: 002b:00007ffc072e8910 EFLAGS: 00000293 ORIG_RAX:
0000000000000002
[   73.495671] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f7d6f2eaead
[   73.496526] RDX: 00000000000001a4 RSI: 0000000000000041 RDI:
00007f7d6d666690
[   73.497381] RBP: 0000000000000000 R08: 0000000000000041 R09:
0000000000000041
[   73.498232] R10: 00007ffc072e85a0 R11: 0000000000000293 R12:
0000000000000000
[   73.499121] R13: 000000003b600000 R14: 00007f7d47136000 R15:
00007f7d6d666510
[  111.135896] XFS (nvme1n1): Unmounting Filesystem
[  111.308034] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"




More information about the Linux-nvme mailing list