nvme-discover: Computer freezes up when persistent device loses connection to controller

Belanger, Martin Martin.Belanger at dell.com
Fri Feb 26 13:41:45 EST 2021


I was experimenting with persistent tcp connections using nvme-discover/nvme-tcp and nvmet-tcp and caused my computer to freeze up.

The steps are as follows (detailed commands below):
1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and configure 2 storage subsystems.
2) Use nvme-discover to set a permanent tcp connection to 127.0.0.1:8009 
3) Confirm that a persistent device was created (/dev/nvme1)
4) Confirm that we can retrieve log pages from persistent nvme1
5) Remove all nvmet configuration.
6) Confirm that the persistent device /dev/nvme1 is still there. 
7) Try to retrieve log pages from persistent device nvme1 
8) Computer freezes

Detailed operations:
 
1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and coonfigure 2 storage subsystems
# modprobe null_blk nr_devices=2
# modprobe nvmet-tcp
 
# mkdir       /sys/kernel/config/nvmet/subsystems/storage1
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage1/attr_allow_any_host
 
# mkdir                 /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/device_path"
# echo -n 1           > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/enable
 
# mkdir       /sys/kernel/config/nvmet/subsystems/storage2
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage2/attr_allow_any_host
 
# mkdir                 /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/device_path"
# echo -n 1           > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/enable
 
# mkdir /sys/kernel/config/nvmet/ports/1
# echo -n 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
# echo -n tcp       > /sys/kernel/config/nvmet/ports/1/addr_trtype
# echo -n 8009      > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
# echo -n ipv4      > /sys/kernel/config/nvmet/ports/1/addr_adrfam
 
# cd /sys/kernel/config/nvmet/ports/1/subsystems
# ln -s ../../../subsystems/storage1 storage1
# ln -s ../../../subsystems/storage2 storage2
 
2) Use nvme-cli to set a permanent tcp connection to 127.0.0.1:8009 
$ sudo nvme discover -g -G -o json -t tcp -a 127.0.0.1 -s 8009 --persistent
{
  "device" : "nvme1",
  "genctr" : 4,
  "records" : [
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage1",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    },
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage2",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    }
  ]
}
 
3) Confirm that a new device was created (/dev/nvme1)
$ ls -l /dev/nvme*
crw------- 1 root root 240,  0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259,  0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259,  1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259,  2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259,  3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259,  4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259,  5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240,  1 Feb 26 11:16 /dev/nvme1
crw------- 1 root root  10, 33 Feb 26 11:11 /dev/nvme-fabrics
 
4) Confirm that we can retrieve log pages from persistent /dev/nvme1
$ sudo nvme discover -d nvme1 -o json
{
  "device" : "nvme1",
  "genctr" : 4,
  "records" : [
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage1",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    },
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage2",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    }
  ]
}
 
5) Remove all nvmet configuration
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage1
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage2
# rmdir /sys/kernel/config/nvmet/ports/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2
 
6) Confirm that the persistent device /dev/nvme1 is still there 
$ ls -l /dev/nvme*
crw------- 1 root root 240,  0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259,  0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259,  1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259,  2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259,  3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259,  4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259,  5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240,  1 Feb 26 11:32 /dev/nvme1    <- Should this device remain if the connection is lost?
crw------- 1 root root  10, 33 Feb 26 11:11 /dev/nvme-fabrics
 
 
7) Try to retrieve log pages from /dev/nvme1
$ sudo nvme discover --device nvme1 -o json
 
8) !!!! COMPUTER IS BRICKED !!!!

Regards,

Martin Belanger
Dell Inc.


More information about the Linux-nvme mailing list