[PATCH 0/3] Introduce fabrics controller loss timeout
Yi Zhang
yizhan at redhat.com
Sun Mar 26 17:41:36 PDT 2017
Hello Sagi
With these three patches, the reconnecting stopped after 60 times.
I restart another test that do fio testing on nvme0n1[1] on client before executing "nvmetclt clear" on target side.
After that, I found another issue that the fio jobs cannot be stopped even I tried "Ctrl + C", and the device node also cannot be released[2].
Here is the kernel log[3].
Let me know if you need more info, thanks
[1]
fio -filename=/dev/nvme0n1 -iodepth=1 -thread -rw=randwrite -ioengine=psync -bssplit=5k/10:9k/10:13k/10:17k/10:21k/10:25k/10:29k/10:33k/10:37k/10:41k/10 -bs_unaligned -runtime=1200 -size=-group_reporting -name=mytest -numjobs=60
[2]
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
sda 8:0 0 279.4G 0 disk
├─sda2 8:2 0 278.4G 0 part
│ ├─rhelp_rdma04-swap 253:1 0 15.8G 0 lvm [SWAP]
│ ├─rhelp_rdma04-home 253:2 0 212.6G 0 lvm /home
│ └─rhelp_rdma04-root 253:0 0 50G 0 lvm /
└─sda1 8:1 0 1G 0 part /boot
nvme0n1 259:0 0 250G 0 disk
[3]
[ 356.812399] nvme nvme0: Reconnecting in 10 seconds...
[ 366.965161] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 367.002048] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 367.029926] nvme nvme0: Failed reconnect attempt 21
[ 367.051905] nvme nvme0: Reconnecting in 10 seconds...
[ 371.444001] INFO: task kworker/u130:1:155 blocked for more than 120 seconds.
[ 371.480773] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 371.505608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 371.540918] kworker/u130:1 D 0 155 2 0x00000000
[ 371.565584] Workqueue: writeback wb_workfn (flush-259:0)
[ 371.590031] Call Trace:
[ 371.600981] __schedule+0x289/0x8f0
[ 371.616644] schedule+0x36/0x80
[ 371.630693] io_schedule+0x16/0x40
[ 371.645565] blk_mq_get_tag+0x16c/0x280
[ 371.662929] ? remove_wait_queue+0x60/0x60
[ 371.680942] __blk_mq_alloc_request+0x1b/0xe0
[ 371.700508] blk_mq_sched_get_request+0x1a0/0x240
[ 371.721616] blk_mq_make_request+0x113/0x620
[ 371.741215] generic_make_request+0x110/0x2c0
[ 371.760755] submit_bio+0x75/0x150
[ 371.776138] submit_bh_wbc+0x141/0x180
[ 371.793106] __block_write_full_page+0x13d/0x3b0
[ 371.814573] ? I_BDEV+0x20/0x20
[ 371.828657] ? I_BDEV+0x20/0x20
[ 371.842717] block_write_full_page+0xe5/0x110
[ 371.862312] blkdev_writepage+0x18/0x20
[ 371.879727] __writepage+0x13/0x40
[ 371.894593] write_cache_pages+0x26f/0x510
[ 371.913039] ? select_idle_sibling+0x29/0x3d0
[ 371.932593] ? compound_head+0x20/0x20
[ 371.949404] generic_writepages+0x51/0x80
[ 371.967972] blkdev_writepages+0x2f/0x40
[ 371.989381] do_writepages+0x1e/0x30
[ 372.007479] __writeback_single_inode+0x45/0x330
[ 372.028326] writeback_sb_inodes+0x280/0x570
[ 372.047594] __writeback_inodes_wb+0x8c/0xc0
[ 372.066852] wb_writeback+0x276/0x310
[ 372.083247] wb_workfn+0x19c/0x3b0
[ 372.098577] process_one_work+0x165/0x410
[ 372.116679] worker_thread+0x137/0x4c0
[ 372.133644] kthread+0x101/0x140
[ 372.148257] ? rescuer_thread+0x3b0/0x3b0
[ 372.166253] ? kthread_park+0x90/0x90
[ 372.182689] ret_from_fork+0x2c/0x40
[ 372.198802] INFO: task systemd-udevd:788 blocked for more than 120 seconds.
[ 372.230377] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 372.253129] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 372.288576] systemd-udevd D 0 788 1 0x00000002
[ 372.313244] Call Trace:
[ 372.324208] __schedule+0x289/0x8f0
[ 372.339835] schedule+0x36/0x80
[ 372.354198] io_schedule+0x16/0x40
[ 372.369040] blk_mq_get_tag+0x16c/0x280
[ 372.385867] ? remove_wait_queue+0x60/0x60
[ 372.404276] __blk_mq_alloc_request+0x1b/0xe0
[ 372.423849] blk_mq_sched_get_request+0x1a0/0x240
[ 372.444945] blk_mq_make_request+0x113/0x620
[ 372.464123] generic_make_request+0x110/0x2c0
[ 372.484885] submit_bio+0x75/0x150
[ 372.502586] submit_bh_wbc+0x141/0x180
[ 372.521625] __block_write_full_page+0x13d/0x3b0
[ 372.542552] ? I_BDEV+0x20/0x20
[ 372.556646] ? I_BDEV+0x20/0x20
[ 372.570750] block_write_full_page+0xe5/0x110
[ 372.590507] blkdev_writepage+0x18/0x20
[ 372.608514] __writepage+0x13/0x40
[ 372.623729] write_cache_pages+0x26f/0x510
[ 372.642116] ? compound_head+0x20/0x20
[ 372.659046] generic_writepages+0x51/0x80
[ 372.677447] blkdev_writepages+0x2f/0x40
[ 372.695072] do_writepages+0x1e/0x30
[ 372.711155] __filemap_fdatawrite_range+0xc6/0x100
[ 372.732778] filemap_write_and_wait+0x3d/0x80
[ 372.752330] __sync_blockdev+0x1f/0x40
[ 372.769151] fsync_bdev+0x44/0x50
[ 372.784048] invalidate_partition+0x24/0x50
[ 372.802835] rescan_partitions+0x52/0x3a0
[ 372.821426] ? selinux_capable+0x20/0x30
[ 372.839444] ? security_capable+0x48/0x60
[ 372.857427] __blkdev_reread_part+0x64/0x70
[ 372.876214] blkdev_reread_part+0x23/0x40
[ 372.894178] blkdev_ioctl+0x46c/0x900
[ 372.910650] block_ioctl+0x41/0x50
[ 372.925899] do_vfs_ioctl+0xa7/0x5e0
[ 372.941931] SyS_ioctl+0x79/0x90
[ 372.956410] ? SyS_flock+0x12c/0x1c0
[ 372.972407] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 372.995057] RIP: 0033:0x7f2604a22507
[ 373.013328] RSP: 002b:00007ffe3be8f228 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 373.049088] RAX: ffffffffffffffda RBX: 000056342ff88de0 RCX: 00007f2604a22507
[ 373.081210] RDX: 0000000000000000 RSI: 000000000000125f RDI: 000000000000000c
[ 373.113650] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007f2605dbb8c0
[ 373.145759] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 373.178107] R13: 00007ffe3be8b1d8 R14: 0000000000000008 R15: 0000000000010300
[ 373.210167] INFO: task fio:3324 blocked for more than 120 seconds.
[ 373.237948] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 373.260671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 373.295605] fio D 0 3324 3252 0x00000080
[ 373.320234] Call Trace:
[ 373.331152] __schedule+0x289/0x8f0
[ 373.346824] schedule+0x36/0x80
[ 373.360958] schedule_preempt_disabled+0xe/0x10
[ 373.381274] __mutex_lock.isra.8+0x266/0x500
[ 373.400423] __mutex_lock_slowpath+0x13/0x20
[ 373.419588] mutex_lock+0x2f/0x40
[ 373.434441] blkdev_put+0x20/0x120
[ 373.449748] blkdev_close+0x25/0x30
[ 373.466217] __fput+0xe7/0x210
[ 373.480691] ____fput+0xe/0x10
[ 373.495002] task_work_run+0x83/0xb0
[ 373.512914] exit_to_usermode_loop+0x59/0x85
[ 373.534017] do_syscall_64+0x165/0x180
[ 373.552724] entry_SYSCALL64_slow_path+0x25/0x25
[ 373.575867] RIP: 0033:0x2b89425194fd
[ 373.591921] RSP: 002b:00002b895b083c40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 373.626630] RAX: 0000000000000000 RBX: 00002b89431806d0 RCX: 00002b89425194fd
[ 373.658765] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 000000000000000f
[ 373.690820] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000cfc
[ 373.722842] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 373.755214] R13: 00002b894b403000 R14: 0000000000000000 R15: 00002b894b4104c0
[ 373.787447] INFO: task fio:3325 blocked for more than 120 seconds.
[ 373.815230] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 373.838263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 373.874051] fio D 0 3325 3252 0x00000080
[ 373.898802] Call Trace:
[ 373.909778] __schedule+0x289/0x8f0
[ 373.925415] schedule+0x36/0x80
[ 373.939503] schedule_preempt_disabled+0xe/0x10
[ 373.959802] __mutex_lock.isra.8+0x266/0x500
[ 373.979022] __mutex_lock_slowpath+0x13/0x20
[ 373.998230] mutex_lock+0x2f/0x40
[ 374.013611] blkdev_put+0x20/0x120
[ 374.031725] blkdev_close+0x25/0x30
[ 374.050176] __fput+0xe7/0x210
[ 374.064775] ____fput+0xe/0x10
[ 374.078489] task_work_run+0x83/0xb0
[ 374.094580] exit_to_usermode_loop+0x59/0x85
[ 374.113768] do_syscall_64+0x165/0x180
[ 374.130553] entry_SYSCALL64_slow_path+0x25/0x25
[ 374.151303] RIP: 0033:0x2b89425194fd
[ 374.167387] RSP: 002b:00002b895ae82c40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 374.201599] RAX: 0000000000000000 RBX: 00002b8943180890 RCX: 00002b89425194fd
[ 374.233708] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 0000000000000037
[ 374.265519] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000cfd
[ 374.297649] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 374.329729] R13: 00002b894b410c00 R14: 0000000000000000 R15: 00002b894b41e0c0
[ 374.361865] INFO: task fio:3327 blocked for more than 120 seconds.
[ 374.389636] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 374.412347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 374.447748] fio D 0 3327 3252 0x00000080
[ 374.472345] Call Trace:
[ 374.483370] __schedule+0x289/0x8f0
[ 374.499146] schedule+0x36/0x80
[ 374.513203] schedule_preempt_disabled+0xe/0x10
[ 374.534772] __mutex_lock.isra.8+0x266/0x500
[ 374.556953] __mutex_lock_slowpath+0x13/0x20
[ 374.577119] mutex_lock+0x2f/0x40
[ 374.591993] blkdev_put+0x20/0x120
[ 374.607965] blkdev_close+0x25/0x30
[ 374.623585] __fput+0xe7/0x210
[ 374.637293] ____fput+0xe/0x10
[ 374.650976] task_work_run+0x83/0xb0
[ 374.667176] exit_to_usermode_loop+0x59/0x85
[ 374.686332] do_syscall_64+0x165/0x180
[ 374.703150] entry_SYSCALL64_slow_path+0x25/0x25
[ 374.723902] RIP: 0033:0x2b89425194fd
[ 374.740073] RSP: 002b:00002b895aa80c40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 374.774171] RAX: 0000000000000000 RBX: 00002b8943180c10 RCX: 00002b89425194fd
[ 374.806303] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 000000000000000a
[ 374.838350] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000cff
[ 374.871310] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 374.903759] R13: 00002b894b42c400 R14: 0000000000000000 R15: 00002b894b4398c0
[ 374.935769] INFO: task fio:3328 blocked for more than 120 seconds.
[ 374.963535] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 374.986330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 375.021111] fio D 0 3328 3252 0x00000080
[ 375.047092] Call Trace:
[ 375.059404] __schedule+0x289/0x8f0
[ 375.076919] schedule+0x36/0x80
[ 375.091092] schedule_preempt_disabled+0xe/0x10
[ 375.111569] __mutex_lock.isra.8+0x266/0x500
[ 375.130372] __mutex_lock_slowpath+0x13/0x20
[ 375.149605] mutex_lock+0x2f/0x40
[ 375.164517] blkdev_put+0x20/0x120
[ 375.179741] blkdev_close+0x25/0x30
[ 375.195456] __fput+0xe7/0x210
[ 375.209262] ____fput+0xe/0x10
[ 375.222946] task_work_run+0x83/0xb0
[ 375.239113] exit_to_usermode_loop+0x59/0x85
[ 375.258416] do_syscall_64+0x165/0x180
[ 375.275285] entry_SYSCALL64_slow_path+0x25/0x25
[ 375.296039] RIP: 0033:0x2b89425194fd
[ 375.312085] RSP: 002b:00002b895a87fc40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 375.346101] RAX: 0000000000000000 RBX: 00002b8943180dd0 RCX: 00002b89425194fd
[ 375.378382] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 0000000000000033
[ 375.411225] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000d00
[ 375.443626] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 375.475713] R13: 00002b894b43a000 R14: 0000000000000000 R15: 00002b894b4474c0
[ 375.507788] INFO: task fio:3329 blocked for more than 120 seconds.
[ 375.535718] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 375.560678] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 375.600320] fio D 0 3329 3252 0x00000080
[ 375.626374] Call Trace:
[ 375.638002] __schedule+0x289/0x8f0
[ 375.654503] schedule+0x36/0x80
[ 375.669362] schedule_preempt_disabled+0xe/0x10
[ 375.690733] __mutex_lock.isra.8+0x266/0x500
[ 375.710360] __mutex_lock_slowpath+0x13/0x20
[ 375.730588] mutex_lock+0x2f/0x40
[ 375.745960] blkdev_put+0x20/0x120
[ 375.761654] blkdev_close+0x25/0x30
[ 375.777527] __fput+0xe7/0x210
[ 375.791235] ____fput+0xe/0x10
[ 375.804915] task_work_run+0x83/0xb0
[ 375.820962] exit_to_usermode_loop+0x59/0x85
[ 375.840572] do_syscall_64+0x165/0x180
[ 375.857423] entry_SYSCALL64_slow_path+0x25/0x25
[ 375.877716] RIP: 0033:0x2b89425194fd
[ 375.894374] RSP: 002b:00002b895a67ec40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 375.928733] RAX: 0000000000000000 RBX: 00002b8943180f90 RCX: 00002b89425194fd
[ 375.960830] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 0000000000000012
[ 375.992567] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000d01
[ 376.024255] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 376.057209] R13: 00002b894b447c00 R14: 0000000000000000 R15: 00002b894b4550c0
[ 376.094684] INFO: task fio:3330 blocked for more than 120 seconds.
[ 376.122962] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 376.145629] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 376.180921] fio D 0 3330 3252 0x00000080
[ 376.205618] Call Trace:
[ 376.216588] __schedule+0x289/0x8f0
[ 376.232522] schedule+0x36/0x80
[ 376.246584] schedule_preempt_disabled+0xe/0x10
[ 376.266981] __mutex_lock.isra.8+0x266/0x500
[ 376.286200] __mutex_lock_slowpath+0x13/0x20
[ 376.305350] mutex_lock+0x2f/0x40
[ 376.320234] blkdev_put+0x20/0x120
[ 376.335129] blkdev_close+0x25/0x30
[ 376.350811] __fput+0xe7/0x210
[ 376.364524] ____fput+0xe/0x10
[ 376.378272] task_work_run+0x83/0xb0
[ 376.394276] exit_to_usermode_loop+0x59/0x85
[ 376.413504] do_syscall_64+0x165/0x180
[ 376.430381] entry_SYSCALL64_slow_path+0x25/0x25
[ 376.451181] RIP: 0033:0x2b89425194fd
[ 376.467187] RSP: 002b:00002b895a47dc40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 376.501281] RAX: 0000000000000000 RBX: 00002b8943181150 RCX: 00002b89425194fd
[ 376.533460] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 000000000000001b
[ 376.565546] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000d02
[ 376.602073] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 376.634662] R13: 00002b894b455800 R14: 0000000000000000 R15: 00002b894b462cc0
[ 376.666879] INFO: task fio:3331 blocked for more than 120 seconds.
[ 376.694623] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 376.717318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 376.752846] fio D 0 3331 3252 0x00000080
[ 376.777775] Call Trace:
[ 376.788747] __schedule+0x289/0x8f0
[ 376.804426] schedule+0x36/0x80
[ 376.818548] schedule_preempt_disabled+0xe/0x10
[ 376.838867] __mutex_lock.isra.8+0x266/0x500
[ 376.858245] __mutex_lock_slowpath+0x13/0x20
[ 376.877437] mutex_lock+0x2f/0x40
[ 376.892312] blkdev_put+0x20/0x120
[ 376.907705] blkdev_close+0x25/0x30
[ 376.924015] __fput+0xe7/0x210
[ 376.937845] ____fput+0xe/0x10
[ 376.951535] task_work_run+0x83/0xb0
[ 376.967630] exit_to_usermode_loop+0x59/0x85
[ 376.986804] do_syscall_64+0x165/0x180
[ 377.003710] entry_SYSCALL64_slow_path+0x25/0x25
[ 377.024454] RIP: 0033:0x2b89425194fd
[ 377.040191] RSP: 002b:00002b895a27cc40 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 377.074447] RAX: 0000000000000000 RBX: 00002b8943181310 RCX: 00002b89425194fd
[ 377.110910] RDX: 00002b89415c8000 RSI: 0000000000000080 RDI: 0000000000000004
[ 377.143293] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000d03
[ 377.175001] R10: 6e493d726f727265 R11: 0000000000000293 R12: 0000000000000000
[ 377.205372] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 377.205394] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 377.206229] nvme nvme0: Failed reconnect attempt 22
[ 377.206231] nvme nvme0: Reconnecting in 10 seconds...
[ 377.308015] R13: 00002b894b463400 R14: 0000000000000000 R15: 00002b894b4708c0
[ 377.340061] INFO: task fio:3332 blocked for more than 120 seconds.
[ 377.368235] Not tainted 4.11.0-rc3.ctrl_tmo+ #1
[ 377.390954] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 377.426117] fio D 0 3332 3252 0x00000080
[ 377.450821] Call Trace:
[ 377.461740] __schedule+0x289/0x8f0
[ 377.477483] ? bit_wait+0x50/0x50
[ 377.492389] schedule+0x36/0x80
[ 377.506526] io_schedule+0x16/0x40
[ 377.521756] bit_wait_io+0x11/0x50
[ 377.537329] __wait_on_bit+0x64/0x90
[ 377.553385] ? bit_wait+0x50/0x50
[ 377.568312] out_of_line_wait_on_bit+0x81/0xb0
[ 377.588802] ? autoremove_wake_function+0x60/0x60
[ 377.614016] __block_write_begin_int+0x3cf/0x6c0
[ 377.637191] ? I_BDEV+0x20/0x20
[ 377.651456] ? I_BDEV+0x20/0x20
[ 377.665628] block_write_begin+0x49/0x90
[ 377.683410] blkdev_write_begin+0x23/0x30
[ 377.701436] generic_perform_write+0xca/0x1c0
[ 377.720995] ? file_update_time+0x5e/0x110
[ 377.740096] __generic_file_write_iter+0x19b/0x1e0
[ 377.762660] blkdev_write_iter+0x8a/0x100
[ 377.781780] ? __inode_security_revalidate+0x4f/0x60
[ 377.805212] __vfs_write+0xe3/0x160
[ 377.821172] vfs_write+0xb2/0x1b0
[ 377.836228] ? syscall_trace_enter+0x1d0/0x2b0
[ 377.856432] SyS_pwrite64+0x87/0xb0
[ 377.872541] do_syscall_64+0x67/0x180
[ 377.888976] entry_SYSCALL64_slow_path+0x25/0x25
[ 377.909777] RIP: 0033:0x2b8942519d63
[ 377.925799] RSP: 002b:00002b895a07bc00 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
[ 377.960704] RAX: ffffffffffffffda RBX: 00002b899000ad40 RCX: 00002b8942519d63
[ 377.992782] RDX: 0000000000000400 RSI: 00002b8990002920 RDI: 0000000000000031
[ 378.024525] RBP: 00002b894b471000 R08: 0000000000000000 R09: 0000000000000000
[ 378.056661] R10: 00000000c6946000 R11: 0000000000000293 R12: 00002b894b471008
[ 378.088923] R13: 0000000000000400 R14: 00002b899000ad68 R15: 00002b899000ad50
[ 387.445743] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 387.481444] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 387.509486] nvme nvme0: Failed reconnect attempt 23
[ 387.531502] nvme nvme0: Reconnecting in 10 seconds...
[ 397.686098] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 397.719849] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 397.749892] nvme nvme0: Failed reconnect attempt 24
--snip--
[ 756.182567] nvme nvme0: Reconnecting in 10 seconds...
[ 766.336578] nvme nvme0: Connect rejected: status 8 (invalid service ID).
[ 766.371583] nvme nvme0: rdma_resolve_addr wait failed (-104).
[ 766.400827] nvme nvme0: Failed reconnect attempt 60
[ 766.423690] nvme nvme0: Removing controller...
Best Regards,
Yi Zhang
----- Original Message -----
From: "Sagi Grimberg" <sagi at grimberg.me>
To: linux-nvme at lists.infradead.org
Cc: "Christoph Hellwig" <hch at lst.de>, "Yi Zhang" <yizhan at redhat.com>
Sent: Sunday, March 19, 2017 6:42:18 AM
Subject: [PATCH 0/3] Introduce fabrics controller loss timeout
In case a host realize that it's controller session is
damaged it schedules periodic reconnects. In case the controller
is gone and will never return, we need a stop condition to give
up on this controller simply remove it.
We allow the user to configure a suitable ctrl_loss_tmo and
set a reasonable default of 10 minutes.
We'll need a complementary nvme-cli exposure that will follow.
Sagi Grimberg (3):
nvme-rdma: get rid of local reconnect_delay
nvme-fabrics: Allow ctrl loss timeout configuration
nvme-rdma: Support ctrl_loss_tmo
drivers/nvme/host/fabrics.c | 28 ++++++++++++++++++++++++++++
drivers/nvme/host/fabrics.h | 10 ++++++++++
drivers/nvme/host/rdma.c | 43 ++++++++++++++++++++++++++++---------------
3 files changed, 66 insertions(+), 15 deletions(-)
--
2.7.4
More information about the Linux-nvme
mailing list