[bug report] nvme/rdma: nvme connect failed after offline one cpu on host side

Yi Zhang yi.zhang at redhat.com
Sun Jul 3 22:42:50 PDT 2022


update the subject to better describe the issue:

So I tried this issue on one nvme/rdma environment, and it was also
reproducible, here are the steps:

# echo 0 >/sys/devices/system/cpu/cpu0/online
# dmesg | tail -10
[  781.577235] smpboot: CPU 0 is now offline
# nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn
Failed to write to /dev/nvme-fabrics: Invalid cross-device link
no controller found: failed to write to nvme-fabrics device

# dmesg
[  781.577235] smpboot: CPU 0 is now offline
[  799.471627] nvme nvme0: creating 39 I/O queues.
[  801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues.
[  801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
[  801.073059] nvme nvme0: failed to connect queue: 1 ret=-18

On Thu, Jun 30, 2022 at 2:02 PM Yi Zhang <yi.zhang at redhat.com> wrote:
>
> Hello
> I found this issue when I run blktests after offline cpus on
> linux-block/for-next, here are the steps and dmesg log,
> and from the log, the test failed with the target connect, feel free
> to let me know if you need any info/test, thanks.
>
> # echo 0 >/sys/devices/system/cpu/cpu0/online
> # ./check nvme/004
> nvme/004 (test nvme and nvmet UUID NS descriptors)           [failed]
>     runtime    ...  0.725s
>     --- tests/nvme/004.out 2022-06-30 01:50:53.637275584 -0400
>     +++ /root/blktests/results/nodev/nvme/004.out.bad 2022-06-30
> 01:55:22.321399448 -0400
>     @@ -1,5 +1,7 @@
>      Running nvme/004
>     -91fdba0d-f87b-4c25-b80f-db7be1418b9e
>     -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
>     -NQN:blktests-subsystem-1 disconnected 1 controller(s)
>     +Failed to write to /dev/nvme-fabrics: Invalid cross-device link
>     +cat: '/sys/class/nvme/nvme*/subsysnqn': No such file or directory
>     +cat: /sys/block/n1/uuid: No such file or directory
>     ...
>     (Run 'diff -u tests/nvme/004.out
> /root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff)
> # dmesg
> [ 1526.169417] numa_remove_cpu cpu 0 node 0: mask now 1-31
> [ 1526.170619] smpboot: CPU 0 is now offline
> [ 1531.030430] loop: module loaded
> [ 1531.115255] run blktests nvme/004 at 2022-06-30 01:55:21
> [ 1531.305557] loop0: detected capacity change from 0 to 2097152
> [ 1531.354299] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1531.402815] nvmet: creating nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
> [ 1531.404124] nvme nvme0: creating 31 I/O queues.
> [ 1531.448181] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang




More information about the Linux-nvme mailing list