[bug report] nvme/rdma: nvme connect failed after offline one cpu on host side
Yi Zhang
yi.zhang at redhat.com
Sun Jul 3 22:42:50 PDT 2022
update the subject to better describe the issue:
So I tried this issue on one nvme/rdma environment, and it was also
reproducible, here are the steps:
# echo 0 >/sys/devices/system/cpu/cpu0/online
# dmesg | tail -10
[ 781.577235] smpboot: CPU 0 is now offline
# nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn
Failed to write to /dev/nvme-fabrics: Invalid cross-device link
no controller found: failed to write to nvme-fabrics device
# dmesg
[ 781.577235] smpboot: CPU 0 is now offline
[ 799.471627] nvme nvme0: creating 39 I/O queues.
[ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues.
[ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
[ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18
On Thu, Jun 30, 2022 at 2:02 PM Yi Zhang <yi.zhang at redhat.com> wrote:
>
> Hello
> I found this issue when I run blktests after offline cpus on
> linux-block/for-next, here are the steps and dmesg log,
> and from the log, the test failed with the target connect, feel free
> to let me know if you need any info/test, thanks.
>
> # echo 0 >/sys/devices/system/cpu/cpu0/online
> # ./check nvme/004
> nvme/004 (test nvme and nvmet UUID NS descriptors) [failed]
> runtime ... 0.725s
> --- tests/nvme/004.out 2022-06-30 01:50:53.637275584 -0400
> +++ /root/blktests/results/nodev/nvme/004.out.bad 2022-06-30
> 01:55:22.321399448 -0400
> @@ -1,5 +1,7 @@
> Running nvme/004
> -91fdba0d-f87b-4c25-b80f-db7be1418b9e
> -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
> -NQN:blktests-subsystem-1 disconnected 1 controller(s)
> +Failed to write to /dev/nvme-fabrics: Invalid cross-device link
> +cat: '/sys/class/nvme/nvme*/subsysnqn': No such file or directory
> +cat: /sys/block/n1/uuid: No such file or directory
> ...
> (Run 'diff -u tests/nvme/004.out
> /root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff)
> # dmesg
> [ 1526.169417] numa_remove_cpu cpu 0 node 0: mask now 1-31
> [ 1526.170619] smpboot: CPU 0 is now offline
> [ 1531.030430] loop: module loaded
> [ 1531.115255] run blktests nvme/004 at 2022-06-30 01:55:21
> [ 1531.305557] loop0: detected capacity change from 0 to 2097152
> [ 1531.354299] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1531.402815] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
> [ 1531.404124] nvme nvme0: creating 31 I/O queues.
> [ 1531.448181] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
>
> --
> Best Regards,
> Yi Zhang
--
Best Regards,
Yi Zhang
More information about the Linux-nvme
mailing list