nvme-discover: Computer freezes up when persistent device loses connection to controller
Belanger, Martin
Martin.Belanger at dell.com
Sat Feb 27 09:57:21 EST 2021
Update.
I reported this issue after experimenting on Ubuntu 20.04 (kernel 5.8).
I repeated the same experiment on Fedora 33 (kernel 5.10) and I'm not seeing the freeze.
It seems that the issue has been resolved between kerel 5.8 and 5.10.
Regards,
Martin
________________________________________
From: Belanger, Martin <Martin_Belanger at Dell.com>
Sent: Friday, February 26, 2021 13:41
To: linux-nvme at lists.infradead.org
Subject: nvme-discover: Computer freezes up when persistent device loses connection to controller
I was experimenting with persistent tcp connections using nvme-discover/nvme-tcp and nvmet-tcp and caused my computer to freeze up.
The steps are as follows (detailed commands below):
1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and configure 2 storage subsystems.
2) Use nvme-discover to set a permanent tcp connection to 127.0.0.1:8009
3) Confirm that a persistent device was created (/dev/nvme1)
4) Confirm that we can retrieve log pages from persistent nvme1
5) Remove all nvmet configuration.
6) Confirm that the persistent device /dev/nvme1 is still there.
7) Try to retrieve log pages from persistent device nvme1
8) Computer freezes
Detailed operations:
1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and coonfigure 2 storage subsystems
# modprobe null_blk nr_devices=2
# modprobe nvmet-tcp
# mkdir /sys/kernel/config/nvmet/subsystems/storage1
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage1/attr_allow_any_host
# mkdir /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/device_path"
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/enable
# mkdir /sys/kernel/config/nvmet/subsystems/storage2
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage2/attr_allow_any_host
# mkdir /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/device_path"
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/enable
# mkdir /sys/kernel/config/nvmet/ports/1
# echo -n 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
# echo -n tcp > /sys/kernel/config/nvmet/ports/1/addr_trtype
# echo -n 8009 > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
# echo -n ipv4 > /sys/kernel/config/nvmet/ports/1/addr_adrfam
# cd /sys/kernel/config/nvmet/ports/1/subsystems
# ln -s ../../../subsystems/storage1 storage1
# ln -s ../../../subsystems/storage2 storage2
2) Use nvme-cli to set a permanent tcp connection to 127.0.0.1:8009
$ sudo nvme discover -g -G -o json -t tcp -a 127.0.0.1 -s 8009 --persistent
{
"device" : "nvme1",
"genctr" : 4,
"records" : [
{
"trtype" : "tcp",
"adrfam" : "ipv4",
"subtype" : "nvme subsystem",
"treq" : "not specified, sq flow control disable supported",
"portid" : 1,
"trsvcid" : "8009",
"subnqn" : "storage1",
"traddr" : "127.0.0.1",
"sectype" : "none"
},
{
"trtype" : "tcp",
"adrfam" : "ipv4",
"subtype" : "nvme subsystem",
"treq" : "not specified, sq flow control disable supported",
"portid" : 1,
"trsvcid" : "8009",
"subnqn" : "storage2",
"traddr" : "127.0.0.1",
"sectype" : "none"
}
]
}
3) Confirm that a new device was created (/dev/nvme1)
$ ls -l /dev/nvme*
crw------- 1 root root 240, 0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259, 1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259, 2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259, 3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259, 4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259, 5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240, 1 Feb 26 11:16 /dev/nvme1
crw------- 1 root root 10, 33 Feb 26 11:11 /dev/nvme-fabrics
4) Confirm that we can retrieve log pages from persistent /dev/nvme1
$ sudo nvme discover -d nvme1 -o json
{
"device" : "nvme1",
"genctr" : 4,
"records" : [
{
"trtype" : "tcp",
"adrfam" : "ipv4",
"subtype" : "nvme subsystem",
"treq" : "not specified, sq flow control disable supported",
"portid" : 1,
"trsvcid" : "8009",
"subnqn" : "storage1",
"traddr" : "127.0.0.1",
"sectype" : "none"
},
{
"trtype" : "tcp",
"adrfam" : "ipv4",
"subtype" : "nvme subsystem",
"treq" : "not specified, sq flow control disable supported",
"portid" : 1,
"trsvcid" : "8009",
"subnqn" : "storage2",
"traddr" : "127.0.0.1",
"sectype" : "none"
}
]
}
5) Remove all nvmet configuration
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage1
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage2
# rmdir /sys/kernel/config/nvmet/ports/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2
6) Confirm that the persistent device /dev/nvme1 is still there
$ ls -l /dev/nvme*
crw------- 1 root root 240, 0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259, 0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259, 1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259, 2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259, 3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259, 4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259, 5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240, 1 Feb 26 11:32 /dev/nvme1 <- Should this device remain if the connection is lost?
crw------- 1 root root 10, 33 Feb 26 11:11 /dev/nvme-fabrics
7) Try to retrieve log pages from /dev/nvme1
$ sudo nvme discover --device nvme1 -o json
8) !!!! COMPUTER IS BRICKED !!!!
Regards,
Martin Belanger
Dell Inc.
More information about the Linux-nvme
mailing list