UBSAN observed with blktests use_siw nvme-rdma nvme/003 on 5.10.0-rc7
Yi Zhang
yi.zhang at redhat.com
Sun Dec 13 04:10:06 EST 2020
Hi
I found this UBSAN issue with latest 5.10.0-rc7, could anyone help check it?
on x86_64:
[ 339.588782] run blktests nvme/003 at 2020-12-11 22:58:49
[ 340.716474] SoftiWARP attached
[ 340.768279] eno2 speed is unknown, defaulting to 1000
[ 340.768378] eno2 speed is unknown, defaulting to 1000
[ 340.768983] eno2 speed is unknown, defaulting to 1000
[ 340.785478] eno3 speed is unknown, defaulting to 1000
[ 340.785549] eno3 speed is unknown, defaulting to 1000
[ 340.786220] eno3 speed is unknown, defaulting to 1000
[ 340.802511] eno4 speed is unknown, defaulting to 1000
[ 340.802582] eno4 speed is unknown, defaulting to 1000
[ 340.803239] eno4 speed is unknown, defaulting to 1000
[ 340.819554] lo speed is unknown, defaulting to 1000
[ 340.819625] lo speed is unknown, defaulting to 1000
[ 340.820274] lo speed is unknown, defaulting to 1000
[ 342.139810] loop: module loaded
[ 342.213031] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 342.243137] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[ 342.247364] nvmet_rdma: enabling port 0 (10.10.183.223:4420)
[ 342.320993] ================================================================================
[ 342.321696] UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
[ 342.321845] shift exponent 64 is too large for 64-bit type 'long unsigned int'
[ 342.322005] CPU: 4 PID: 441 Comm: kworker/u12:9 Tainted: G S 5.10.0-rc7 #1
[ 342.322014] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
[ 342.322029] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[ 342.322045] Call Trace:
[ 342.322063] dump_stack+0x99/0xcb
[ 342.322082] ubsan_epilogue+0x5/0x40
[ 342.322092] __ubsan_handle_shift_out_of_bounds.cold.11+0xb4/0xf3
[ 342.322121] ? down_write+0x183/0x3d0
[ 342.322155] siw_qp_modify.cold.8+0x2d/0x32 [siw]
[ 342.322168] ? __local_bh_enable_ip+0xa5/0xf0
[ 342.322199] siw_accept+0x906/0x1b60 [siw]
[ 342.322241] ? siw_connect+0x17a0/0x17a0 [siw]
[ 342.322268] ? mark_held_locks+0xb7/0x120
[ 342.322294] ? lockdep_hardirqs_on_prepare+0x28f/0x3e0
[ 342.322303] ? _raw_spin_unlock_irqrestore+0x39/0x40
[ 342.322332] iw_cm_accept+0x1f4/0x430 [iw_cm]
[ 342.322369] rdma_accept+0x3fa/0xb10 [rdma_cm]
[ 342.322383] ? check_flush_dependency+0x410/0x410
[ 342.322402] ? cma_rep_recv+0x570/0x570 [rdma_cm]
[ 342.322434] ? complete+0x18/0x70
[ 342.322481] nvmet_rdma_queue_connect+0x1a62/0x2680 [nvmet_rdma]
[ 342.322541] ? nvmet_rdma_alloc_cmds+0xce0/0xce0 [nvmet_rdma]
[ 342.322563] ? lock_downgrade+0x700/0x700
[ 342.322583] ? __xa_alloc_cyclic+0xef/0x350
[ 342.322642] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 342.322651] ? __ww_mutex_die+0x190/0x190
[ 342.322685] cma_cm_event_handler+0xf2/0x500 [rdma_cm]
[ 342.322712] iw_conn_req_handler+0x910/0xcb0 [rdma_cm]
[ 342.322726] ? lock_downgrade+0x700/0x700
[ 342.322734] ? rcu_read_unlock+0x50/0x50
[ 342.322748] ? sched_clock+0x5/0x10
[ 342.322789] ? cma_ib_handler+0x8a0/0x8a0 [rdma_cm]
[ 342.322810] ? __kasan_kmalloc.constprop.7+0xc1/0xd0
[ 342.322875] cm_work_handler+0x121c/0x17a0 [iw_cm]
[ 342.322910] ? iw_cm_reject+0x190/0x190 [iw_cm]
[ 342.323005] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 342.323016] ? lockdep_hardirqs_on_prepare+0x28f/0x3e0
[ 342.323043] process_one_work+0x8fb/0x16c0
[ 342.323054] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 342.323091] ? pwq_dec_nr_in_flight+0x320/0x320
[ 342.323141] worker_thread+0x87/0xb40
[ 342.323177] ? process_one_work+0x16c0/0x16c0
[ 342.323191] kthread+0x35f/0x430
[ 342.323203] ? kthread_mod_delayed_work+0x180/0x180
[ 342.323223] ret_from_fork+0x22/0x30
[ 342.323319] ================================================================================
[ 342.335402] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:ce2e8205-87d6-4212-b108-e1f267e1c459.
[ 342.341153] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.10.183.223:4420
[ 352.445399] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 352.484535] ======================================================
[ 352.484608] WARNING: possible circular locking dependency detected
[ 352.484683] 5.10.0-rc7 #1 Tainted: G S
[ 352.484743] ------------------------------------------------------
[ 352.484816] kworker/1:1/51 is trying to acquire lock:
[ 352.484878] ffff88820928f3e0 (&id_priv->handler_mutex){+.+.}-{3:3}, at: rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.485013]
but task is already holding lock:
[ 352.485081] ffff888100f3fe00 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x809/0x16c0
[ 352.485215]
which lock already depends on the new lock.
[ 352.485307]
the existing dependency chain (in reverse order) is:
[ 352.485392]
-> #3 ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
[ 352.485499] lock_acquire+0x1c8/0x880
[ 352.485552] process_one_work+0x85a/0x16c0
[ 352.485610] worker_thread+0x87/0xb40
[ 352.485663] kthread+0x35f/0x430
[ 352.485712] ret_from_fork+0x22/0x30
[ 352.485763]
-> #2 ((wq_completion)events){+.+.}-{0:0}:
[ 352.485851] lock_acquire+0x1c8/0x880
[ 352.485904] flush_workqueue+0x109/0x1390
[ 352.485963] nvmet_rdma_queue_connect+0x1b52/0x2680 [nvmet_rdma]
[ 352.486046] cma_cm_event_handler+0xf2/0x500 [rdma_cm]
[ 352.486121] iw_conn_req_handler+0x910/0xcb0 [rdma_cm]
[ 352.486192] cm_work_handler+0x121c/0x17a0 [iw_cm]
[ 352.486257] process_one_work+0x8fb/0x16c0
[ 352.486315] worker_thread+0x87/0xb40
[ 352.486367] kthread+0x35f/0x430
[ 352.486415] ret_from_fork+0x22/0x30
[ 352.486466]
-> #1 (&id_priv->handler_mutex/1){+.+.}-{3:3}:
[ 352.490786] lock_acquire+0x1c8/0x880
[ 352.492963] __mutex_lock+0x163/0x1410
[ 352.495140] iw_conn_req_handler+0x343/0xcb0 [rdma_cm]
[ 352.497264] cm_work_handler+0x121c/0x17a0 [iw_cm]
[ 352.499321] process_one_work+0x8fb/0x16c0
[ 352.501358] worker_thread+0x87/0xb40
[ 352.503412] kthread+0x35f/0x430
[ 352.505469] ret_from_fork+0x22/0x30
[ 352.507474]
-> #0 (&id_priv->handler_mutex){+.+.}-{3:3}:
[ 352.511362] check_prevs_add+0x3fd/0x2460
[ 352.513280] __lock_acquire+0x255e/0x3080
[ 352.515151] lock_acquire+0x1c8/0x880
[ 352.517001] __mutex_lock+0x163/0x1410
[ 352.518830] rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.520648] nvmet_rdma_free_queue+0x7a/0x380 [nvmet_rdma]
[ 352.522470] nvmet_rdma_release_queue_work+0x3b/0x90 [nvmet_rdma]
[ 352.524288] process_one_work+0x8fb/0x16c0
[ 352.526052] worker_thread+0x87/0xb40
[ 352.527758] kthread+0x35f/0x430
[ 352.529456] ret_from_fork+0x22/0x30
[ 352.531091]
other info that might help us debug this:
[ 352.535736] Chain exists of:
&id_priv->handler_mutex --> (wq_completion)events --> (work_completion)(&queue->release_work)
[ 352.540596] Possible unsafe locking scenario:
[ 352.543799] CPU0 CPU1
[ 352.545393] ---- ----
[ 352.546972] lock((work_completion)(&queue->release_work));
[ 352.548588] lock((wq_completion)events);
[ 352.550243] lock((work_completion)(&queue->release_work));
[ 352.551928] lock(&id_priv->handler_mutex);
[ 352.553604]
*** DEADLOCK ***
[ 352.558447] 2 locks held by kworker/1:1/51:
[ 352.560048] #0: ffff888100052d48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x7d5/0x16c0
[ 352.561773] #1: ffff888100f3fe00 ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: process_one_work+0x809/0x16c0
[ 352.563557]
stack backtrace:
[ 352.567023] CPU: 1 PID: 51 Comm: kworker/1:1 Tainted: G S 5.10.0-rc7 #1
[ 352.568839] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.1.10 03/10/2015
[ 352.570699] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[ 352.572587] Call Trace:
[ 352.574452] dump_stack+0x99/0xcb
[ 352.576304] check_noncircular+0x27f/0x320
[ 352.578171] ? print_circular_bug+0x440/0x440
[ 352.580032] ? print_irqtrace_events+0x270/0x270
[ 352.581887] ? mark_lock.part.29+0x107/0x14e0
[ 352.583740] ? print_irqtrace_events+0x270/0x270
[ 352.585597] check_prevs_add+0x3fd/0x2460
[ 352.587460] ? check_irq_usage+0xe30/0xe30
[ 352.589328] ? find_held_lock+0x1c0/0x1c0
[ 352.591197] ? sched_clock_cpu+0x18/0x1d0
[ 352.593054] __lock_acquire+0x255e/0x3080
[ 352.594901] lock_acquire+0x1c8/0x880
[ 352.596748] ? rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.598610] ? rcu_read_unlock+0x50/0x50
[ 352.600474] __mutex_lock+0x163/0x1410
[ 352.602333] ? rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.604190] ? wait_for_completion+0x89/0x260
[ 352.606044] ? rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.607883] ? mutex_lock_io_nested+0x12e0/0x12e0
[ 352.609729] ? rcu_read_lock_sched_held+0xaf/0xe0
[ 352.611577] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 352.613428] ? nvmet_sq_destroy+0x164/0x530 [nvmet]
[ 352.615296] ? rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.617160] rdma_destroy_id+0x17/0x20 [rdma_cm]
[ 352.619022] nvmet_rdma_free_queue+0x7a/0x380 [nvmet_rdma]
[ 352.620914] ? lockdep_hardirqs_on_prepare+0x28f/0x3e0
[ 352.622769] nvmet_rdma_release_queue_work+0x3b/0x90 [nvmet_rdma]
[ 352.624602] process_one_work+0x8fb/0x16c0
[ 352.626428] ? rcu_read_lock_bh_held+0xc0/0xc0
[ 352.628265] ? pwq_dec_nr_in_flight+0x320/0x320
[ 352.630121] worker_thread+0x87/0xb40
[ 352.631987] ? __kthread_parkme+0xd1/0x1a0
[ 352.633841] ? process_one_work+0x16c0/0x16c0
[ 352.635674] kthread+0x35f/0x430
[ 352.637490] ? kthread_mod_delayed_work+0x180/0x180
[ 352.639293] ret_from_fork+0x22/0x30
[ 353.044114] infiniband lo_siw: ib_query_port failed (-19)
[ 353.046030] infiniband eno1_siw: ib_query_port failed (-19)
[ 353.047937] infiniband eno2_siw: ib_query_port failed (-19)
[ 353.049822] infiniband eno3_siw: ib_query_port failed (-19)
[ 353.051627] infiniband eno4_siw: ib_query_port failed (-19)
[ 353.053566] SoftiWARP detached
on ppc64le:
[10815.999304] run blktests nvme/003 at 2020-12-12 21:30:01
[10819.200531] SoftiWARP attached
[10819.260326] lo speed is unknown, defaulting to 1000
[10819.260338] lo speed is unknown, defaulting to 1000
[10819.260360] lo speed is unknown, defaulting to 1000
[10819.560036] loop: module loaded
[10819.660726] Loading iSCSI transport class v2.0-870.
[10819.710612] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[10819.719478] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[10819.719777] nvmet_rdma: enabling port 0 (10.16.224.162:4420)
[10819.731002] kworker/u16:2: vmalloc: allocation failure: 0 bytes, mode:0xdc0(GFP_KERNEL|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
[10819.731022] CPU: 5 PID: 344 Comm: kworker/u16:2 Not tainted 5.10.0-rc7 #1
[10819.731033] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[10819.731038] Call Trace:
[10819.731044] [c000000007eeb370] [c00000000079cce0] dump_stack+0xc4/0x114 (unreliable)
[10819.731052] [c000000007eeb3b0] [c00000000044cf4c] warn_alloc+0x12c/0x1c0
[10819.731058] [c000000007eeb450] [c000000000442f9c] __vmalloc_node_range+0x24c/0x3e0
[10819.731063] [c000000007eeb500] [c000000000443348] vzalloc+0x58/0x70
[10819.731072] [c000000007eeb570] [c008000000f499ec] siw_qp_modify+0x604/0x7f0 [siw]
[10819.731080] [c000000007eeb600] [c008000000f43ec0] siw_accept+0x248/0x860 [siw]
[10819.731086] [c000000007eeb6e0] [c0080000008f02c8] iw_cm_accept+0xf0/0x290 [iw_cm]
[10819.731096] [c000000007eeb740] [c008000000e92b7c] rdma_accept+0x144/0x340 [rdma_cm]
[10819.731102] [c000000007eeb7b0] [c008000000a14390] nvmet_rdma_queue_connect+0xae8/0xe20 [nvmet_rdma]
[10819.731112] [c000000007eeb910] [c008000000e92448] cma_cm_event_handler+0x50/0x150 [rdma_cm]
[10819.731121] [c000000007eeb950] [c008000000e98bc0] iw_conn_req_handler+0x388/0x470 [rdma_cm]
[10819.731129] [c000000007eeba50] [c0080000008f2288] cm_work_handler+0x780/0x820 [iw_cm]
[10819.731136] [c000000007eebc70] [c00000000017cdf0] process_one_work+0x260/0x530
[10819.731143] [c000000007eebd10] [c00000000017d148] worker_thread+0x88/0x5e0
[10819.731149] [c000000007eebdb0] [c000000000187f10] kthread+0x1a0/0x1b0
[10819.731155] [c000000007eebe20] [c00000000000d3f0] ret_from_kernel_thread+0x5c/0x6c
[10819.731161] Mem-Info:
[10819.731168] active_anon:71 inactive_anon:3279 isolated_anon:0
active_file:7133 inactive_file:25899 isolated_file:0
unevictable:0 dirty:2 writeback:0
slab_reclaimable:2660 slab_unreclaimable:4555
mapped:1630 shmem:197 pagetables:80 bounce:0
free:84961 free_pcp:82 free_cma:0
[10819.731185] Node 0 active_anon:4544kB inactive_anon:209856kB active_file:456512kB inactive_file:1657536kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:104320kB dirty:128kB writeback:0kB shmem:12608kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:3632kB all_unreclaimable? no
[10819.731195] Node 0 Normal free:5437504kB min:180224kB low:225280kB high:270336kB reserved_highatomic:0KB active_anon:4544kB inactive_anon:209856kB active_file:456512kB inactive_file:1657536kB unevictable:0kB writepending:128kB present:8388608kB managed:8297472kB mlocked:0kB pagetables:5120kB bounce:0kB free_pcp:5248kB local_pcp:256kB free_cma:0kB
[10819.731210] lowmem_reserve[]: 0 0 0
[10819.731216] Node 0 Normal: 6*64kB (ME) 11*128kB (UME) 65*256kB (M) 111*512kB (UME) 41*1024kB (UM) 5*2048kB (UM) 4*4096kB (UM) 2*8192kB (UE) 322*16384kB (M) = 5435904kB
[10819.731241] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16384kB
[10819.731243] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB
[10819.731244] 33258 total pagecache pages
[10819.731248] 0 pages in swap cache
[10819.731251] Swap cache stats: add 0, delete 0, find 0/0
[10819.731255] Free swap = 4222912kB
[10819.731258] Total swap = 4222912kB
[10819.731261] 131072 pages RAM
[10819.731263] 0 pages HighMem/MovableOnly
[10819.731266] 1424 pages reserved
[10819.731269] 0 pages cma reserved
[10819.731273] 0 pages hwpoisoned
[10819.731304] nvmet_rdma: rdma_accept failed (error code = -12)
[10819.823296] nvme nvme0: Connect rejected: status -104 (reset by remote host).
[10819.823312] nvme nvme0: rdma connection establishment failed (-104)
[10820.011181] iscsi: registered transport (iser)
[10820.680443] Rounding down aligned max_sectors from 4294967295 to 4294967168
[10820.680501] db_root: cannot open: /etc/target
[10821.539792] lo speed is unknown, defaulting to 1000
[10821.539804] ib_srpt MAD registration failed for lo_siw-1.
[10821.539811] ib_srpt srpt_add_one(lo_siw) failed.
[10823.310251] RPC: Registered rdma transport module.
[10823.310253] RPC: Registered rdma backchannel transport module.
[10830.441052] lo speed is unknown, defaulting to 1000
[10830.441415] SoftiWARP detached
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list