kernel null pointer at nvme_tcp_init_iter[nvme_tcp] with blktests nvme-tcp/012
Chaitanya Kulkarni
Chaitanya.Kulkarni at wdc.com
Sun Feb 7 00:48:22 EST 2021
On 2/6/21 20:51, Yi Zhang wrote:
> The issue was introduced after merge the NVMe updates
>
> commit 0fd6456fd1f4c8f3ec5a2df6ed7f34458a180409 (HEAD)
> Merge: 44d10e4b2f2c 0d7389718c32
> Author: Jens Axboe <axboe at kernel.dk>
> Date: Tue Feb 2 07:12:06 2021 -0700
>
> Merge branch 'for-5.12/drivers' into for-next
>
> * for-5.12/drivers: (22 commits)
> nvme-tcp: use cancel tagset helper for tear down
> nvme-rdma: use cancel tagset helper for tear down
> nvme-tcp: add clean action for failed reconnection
> nvme-rdma: add clean action for failed reconnection
> nvme-core: add cancel tagset helpers
> nvme-core: get rid of the extra space
> nvme: add tracing of zns commands
> nvme: parse format nvm command details when tracing
> nvme: update enumerations for status codes
> nvmet: add lba to sect conversion helpers
> nvmet: remove extra variable in identify ns
> nvmet: remove extra variable in id-desclist
> nvmet: remove extra variable in smart log nsid
> nvme: refactor ns->ctrl by request
> nvme-tcp: pass multipage bvec to request iov_iter
> nvme-tcp: get rid of unused helper function
> nvme-tcp: fix wrong setting of request iov_iter
> nvme: support command retry delay for admin command
> nvme: constify static attribute_group structs
> nvmet-fc: use RCU proctection for assoc_list
>
>
> On 2/6/21 11:08 AM, Yi Zhang wrote:
>> blktests nvme-tcp/012
>
Running the test few times I got this once no sign of Oops though :-
# nvme_trtype=tcp ./check nvme/012
nvme/012 (run mkfs and data verification fio job on NVMeOF block
device-backed ns) [failed]
runtime 37.624s ... 42.272s
something found in dmesg:
[ 69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
[ 69.330277] loop: module loaded
[ 69.333383] loop0: detected capacity change from 2097152 to 0
[ 69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
[ 69.399482] nvme nvme1: creating 64 I/O queues.
[ 69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
[ 69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1",
addr 127.0.0.1:4420
[ 72.381282] XFS (nvme1n1): Mounting V5 Filesystem
...
(See '/root/blktests/results/nodev/nvme/012.dmesg' for the entire
message)
blktests (master) # dmesg -c
[ 38.347661] nvme nvme0: 63/0/0 default/read/poll queues
[ 69.198819] run blktests nvme/012 at 2021-02-06 21:32:23
[ 69.330277] loop: module loaded
[ 69.333383] loop0: detected capacity change from 2097152 to 0
[ 69.351091] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 69.372439] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 69.396581] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:e4cfc949-8f19-4db2-a232-ab360b79204a.
[ 69.399482] nvme nvme1: creating 64 I/O queues.
[ 69.414721] nvme nvme1: mapped 64/0/0 default/read/poll queues.
[ 69.448193] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420
[ 72.381282] XFS (nvme1n1): Mounting V5 Filesystem
[ 72.394603] XFS (nvme1n1): Ending clean mount
[ 72.395463] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)
[ 73.402838] ======================================================
[ 73.403491] WARNING: possible circular locking dependency detected
[ 73.404125] 5.11.0-rc6blk+ #166 Tainted: G OE
[ 73.404690] ------------------------------------------------------
[ 73.405332] fio/3671 is trying to acquire lock:
[ 73.405803] ffff88881d110cb0 (sk_lock-AF_INET){+.+.}-{0:0}, at:
tcp_sendpage+0x23/0x50
[ 73.406627]
but task is already holding lock:
[ 73.407246] ffff888128cabcf0 (&xfs_dir_ilock_class/5){+.+.}-{3:3},
at: xfs_ilock+0xbf/0x250 [xfs]
[ 73.408233]
which lock already depends on the new lock.
[ 73.409096]
the existing dependency chain (in reverse order) is:
[ 73.409885]
-> #3 (&xfs_dir_ilock_class/5){+.+.}-{3:3}:
[ 73.410594] lock_acquire+0xd2/0x390
[ 73.411045] down_write_nested+0x47/0x110
[ 73.411530] xfs_ilock+0xbf/0x250 [xfs]
[ 73.412068] xfs_create+0x1d9/0x6b0 [xfs]
[ 73.412626] xfs_generic_create+0x205/0x2c0 [xfs]
[ 73.413255] lookup_open+0x4f6/0x630
[ 73.413710] path_openat+0x298/0xa80
[ 73.414158] do_filp_open+0x93/0x100
[ 73.414606] do_sys_openat2+0x24b/0x310
[ 73.415090] do_sys_open+0x4b/0x80
[ 73.415569] do_syscall_64+0x33/0x40
[ 73.416075] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 73.416747]
-> #2 (sb_internal){.+.+}-{0:0}:
[ 73.417436] lock_acquire+0xd2/0x390
[ 73.417938] xfs_trans_alloc+0x19e/0x2e0 [xfs]
[ 73.418607] xfs_vn_update_time+0xc8/0x250 [xfs]
[ 73.419295] touch_atime+0x16b/0x200
[ 73.419814] xfs_file_mmap+0xa7/0xb0 [xfs]
[ 73.420447] mmap_region+0x3f6/0x690
[ 73.420955] do_mmap+0x379/0x580
[ 73.421415] vm_mmap_pgoff+0xdf/0x170
[ 73.421929] ksys_mmap_pgoff+0x1dd/0x240
[ 73.422477] do_syscall_64+0x33/0x40
[ 73.422980] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 73.423672]
-> #1 (&mm->mmap_lock#2){++++}-{3:3}:
[ 73.424426] lock_acquire+0xd2/0x390
[ 73.424932] __might_fault+0x5e/0x80
[ 73.425433] _copy_from_user+0x1e/0xa0
[ 73.425966] do_ip_setsockopt.isra.15+0xbba/0x12f0
[ 73.426616] ip_setsockopt+0x34/0x90
[ 73.427120] __sys_setsockopt+0x8f/0x110
[ 73.427681] __x64_sys_setsockopt+0x20/0x30
[ 73.428252] do_syscall_64+0x33/0x40
[ 73.428751] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 73.429422]
-> #0 (sk_lock-AF_INET){+.+.}-{0:0}:
[ 73.430154] validate_chain+0x8fa/0xec0
[ 73.430681] __lock_acquire+0x563/0x940
[ 73.431212] lock_acquire+0xd2/0x390
[ 73.431719] lock_sock_nested+0x72/0xa0
[ 73.432252] tcp_sendpage+0x23/0x50
[ 73.432740] inet_sendpage+0x4f/0x80
[ 73.433240] kernel_sendpage+0x57/0xc0
[ 73.433760] nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[ 73.434435] nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[ 73.435104] __blk_mq_try_issue_directly+0x109/0x1b0
[ 73.435768] blk_mq_request_issue_directly+0x4b/0x80
[ 73.436433] blk_mq_try_issue_list_directly+0x62/0xf0
[ 73.437104] blk_mq_sched_insert_requests+0x192/0x2b0
[ 73.437774] blk_mq_flush_plug_list+0x13c/0x260
[ 73.438389] blk_flush_plug_list+0xd7/0x100
[ 73.438960] blk_finish_plug+0x21/0x30
[ 73.439485] _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[ 73.440171] __xfs_buf_submit+0x85/0x220 [xfs]
[ 73.440833] xfs_buf_read_map+0x105/0x2a0 [xfs]
[ 73.441511] xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[ 73.442254] xfs_read_agi+0xb4/0x1d0 [xfs]
[ 73.442874] xfs_ialloc_read_agi+0x48/0x170 [xfs]
[ 73.443568] xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[ 73.444277] xfs_dir_ialloc+0x72/0x630 [xfs]
[ 73.444927] xfs_create+0x241/0x6b0 [xfs]
[ 73.445550] xfs_generic_create+0x205/0x2c0 [xfs]
[ 73.446255] lookup_open+0x4f6/0x630
[ 73.446759] path_openat+0x298/0xa80
[ 73.447268] do_filp_open+0x93/0x100
[ 73.447770] do_sys_openat2+0x24b/0x310
[ 73.448301] do_sys_open+0x4b/0x80
[ 73.448779] do_syscall_64+0x33/0x40
[ 73.449275] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 73.449944]
other info that might help us debug this:
[ 73.450899] Chain exists of:
sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5
[ 73.452222] Possible unsafe locking scenario:
[ 73.452931] CPU0 CPU1
[ 73.453479] ---- ----
[ 73.454024] lock(&xfs_dir_ilock_class/5);
[ 73.454528] lock(sb_internal);
[ 73.455217] lock(&xfs_dir_ilock_class/5);
[ 73.456027] lock(sk_lock-AF_INET);
[ 73.456463]
*** DEADLOCK ***
[ 73.457175] 6 locks held by fio/3671:
[ 73.457618] #0: ffff8881185a9488 (sb_writers#9){.+.+}-{0:0}, at:
path_openat+0xa61/0xa80
[ 73.458604] #1: ffff888128cabfe0
(&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at:
path_openat+0x287/0xa80
[ 73.459847] #2: ffff8881185a96a8 (sb_internal){.+.+}-{0:0}, at:
xfs_create+0x1b4/0x6b0 [xfs]
[ 73.460926] #3: ffff888128cabcf0
(&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xbf/0x250 [xfs]
[ 73.462100] #4: ffff88881bd54c58 (hctx->srcu){....}-{0:0}, at:
hctx_lock+0x62/0xe0
[ 73.463029] #5: ffff888142ab29d0 (&queue->send_mutex){+.+.}-{3:3},
at: nvme_tcp_queue_rq+0x2fc/0x330 [nvme_tcp]
[ 73.464288]
stack backtrace:
[ 73.464824] CPU: 16 PID: 3671 Comm: fio Tainted: G OE
5.11.0-rc6blk+ #166
[ 73.465785] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[ 73.467176] Call Trace:
[ 73.467483] dump_stack+0x8d/0xb5
[ 73.467910] check_noncircular+0x119/0x130
[ 73.468413] ? save_trace+0x3d/0x2c0
[ 73.468854] validate_chain+0x8fa/0xec0
[ 73.469329] __lock_acquire+0x563/0x940
[ 73.469800] lock_acquire+0xd2/0x390
[ 73.470240] ? tcp_sendpage+0x23/0x50
[ 73.470689] lock_sock_nested+0x72/0xa0
[ 73.471163] ? tcp_sendpage+0x23/0x50
[ 73.471624] tcp_sendpage+0x23/0x50
[ 73.472059] inet_sendpage+0x4f/0x80
[ 73.472504] kernel_sendpage+0x57/0xc0
[ 73.472965] ? mark_held_locks+0x4f/0x70
[ 73.473445] nvme_tcp_try_send+0x137/0x7d0 [nvme_tcp]
[ 73.474062] nvme_tcp_queue_rq+0x317/0x330 [nvme_tcp]
[ 73.474697] __blk_mq_try_issue_directly+0x109/0x1b0
[ 73.475324] blk_mq_request_issue_directly+0x4b/0x80
[ 73.475953] blk_mq_try_issue_list_directly+0x62/0xf0
[ 73.476587] blk_mq_sched_insert_requests+0x192/0x2b0
[ 73.477225] blk_mq_flush_plug_list+0x13c/0x260
[ 73.477802] blk_flush_plug_list+0xd7/0x100
[ 73.478338] blk_finish_plug+0x21/0x30
[ 73.478818] _xfs_buf_ioapply+0x2cc/0x3c0 [xfs]
[ 73.479457] __xfs_buf_submit+0x85/0x220 [xfs]
[ 73.480091] xfs_buf_read_map+0x105/0x2a0 [xfs]
[ 73.480728] ? xfs_read_agi+0xb4/0x1d0 [xfs]
[ 73.481331] xfs_trans_read_buf_map+0x2b2/0x610 [xfs]
[ 73.482036] ? xfs_read_agi+0xb4/0x1d0 [xfs]
[ 73.482635] xfs_read_agi+0xb4/0x1d0 [xfs]
[ 73.483221] xfs_ialloc_read_agi+0x48/0x170 [xfs]
[ 73.483866] xfs_dialloc_select_ag+0x94/0x2a0 [xfs]
[ 73.484513] xfs_dir_ialloc+0x72/0x630 [xfs]
[ 73.485092] ? xfs_ilock+0xbf/0x250 [xfs]
[ 73.485648] xfs_create+0x241/0x6b0 [xfs]
[ 73.486198] xfs_generic_create+0x205/0x2c0 [xfs]
[ 73.486857] lookup_open+0x4f6/0x630
[ 73.487317] path_openat+0x298/0xa80
[ 73.487786] ? __lock_acquire+0x581/0x940
[ 73.488302] do_filp_open+0x93/0x100
[ 73.488769] ? do_raw_spin_unlock+0x49/0xc0
[ 73.489307] ? _raw_spin_unlock+0x1f/0x30
[ 73.489825] do_sys_openat2+0x24b/0x310
[ 73.490321] do_sys_open+0x4b/0x80
[ 73.490764] do_syscall_64+0x33/0x40
[ 73.491230] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 73.491890] RIP: 0033:0x7f7d6f2eaead
[ 73.492353] Code: c5 20 00 00 75 10 b8 02 00 00 00 0f 05 48 3d 01 f0
ff ff 73 31 c3 48 83 ec 08 e8 7e f4 ff ff 48 89 04 24 b8 02 00 00 00 0f
05 <48> 8b 3c 24 48 89 c2 e8 c7 f4 ff ff 48 89 d0 48 83 c4 08 48 3d 01
[ 73.494713] RSP: 002b:00007ffc072e8910 EFLAGS: 00000293 ORIG_RAX:
0000000000000002
[ 73.495671] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
00007f7d6f2eaead
[ 73.496526] RDX: 00000000000001a4 RSI: 0000000000000041 RDI:
00007f7d6d666690
[ 73.497381] RBP: 0000000000000000 R08: 0000000000000041 R09:
0000000000000041
[ 73.498232] R10: 00007ffc072e85a0 R11: 0000000000000293 R12:
0000000000000000
[ 73.499121] R13: 000000003b600000 R14: 00007f7d47136000 R15:
00007f7d6d666510
[ 111.135896] XFS (nvme1n1): Unmounting Filesystem
[ 111.308034] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
More information about the Linux-nvme
mailing list