nvme-discover: Computer freezes up when persistent device loses connection to controller

Belanger, Martin Martin.Belanger at dell.com
Sat Feb 27 09:57:21 EST 2021


Update.

I reported this issue after experimenting on Ubuntu 20.04 (kernel 5.8).

I repeated the same experiment on Fedora 33 (kernel 5.10) and I'm not seeing the freeze.

It seems that the issue has been resolved between kerel 5.8 and 5.10.

Regards,
Martin

________________________________________
From: Belanger, Martin <Martin_Belanger at Dell.com>
Sent: Friday, February 26, 2021 13:41
To: linux-nvme at lists.infradead.org
Subject: nvme-discover: Computer freezes up when persistent device loses connection to controller

I was experimenting with persistent tcp connections using nvme-discover/nvme-tcp and nvmet-tcp and caused my computer to freeze up.

The steps are as follows (detailed commands below):
1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and configure 2 storage subsystems.
2) Use nvme-discover to set a permanent tcp connection to 127.0.0.1:8009
3) Confirm that a persistent device was created (/dev/nvme1)
4) Confirm that we can retrieve log pages from persistent nvme1
5) Remove all nvmet configuration.
6) Confirm that the persistent device /dev/nvme1 is still there.
7) Try to retrieve log pages from persistent device nvme1
8) Computer freezes

Detailed operations:

1) Configure nvmet to listen for tcp connection at 127.0.0.1:8009 and coonfigure 2 storage subsystems
# modprobe null_blk nr_devices=2
# modprobe nvmet-tcp

# mkdir       /sys/kernel/config/nvmet/subsystems/storage1
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage1/attr_allow_any_host

# mkdir                 /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/device_path"
# echo -n 1           > /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1/enable

# mkdir       /sys/kernel/config/nvmet/subsystems/storage2
# echo -n 1 > /sys/kernel/config/nvmet/subsystems/storage2/attr_allow_any_host

# mkdir                 /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# echo -n /dev/nullb0 > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/device_path"
# echo -n 1           > /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1/enable

# mkdir /sys/kernel/config/nvmet/ports/1
# echo -n 127.0.0.1 > /sys/kernel/config/nvmet/ports/1/addr_traddr
# echo -n tcp       > /sys/kernel/config/nvmet/ports/1/addr_trtype
# echo -n 8009      > /sys/kernel/config/nvmet/ports/1/addr_trsvcid
# echo -n ipv4      > /sys/kernel/config/nvmet/ports/1/addr_adrfam

# cd /sys/kernel/config/nvmet/ports/1/subsystems
# ln -s ../../../subsystems/storage1 storage1
# ln -s ../../../subsystems/storage2 storage2

2) Use nvme-cli to set a permanent tcp connection to 127.0.0.1:8009
$ sudo nvme discover -g -G -o json -t tcp -a 127.0.0.1 -s 8009 --persistent
{
  "device" : "nvme1",
  "genctr" : 4,
  "records" : [
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage1",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    },
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage2",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    }
  ]
}

3) Confirm that a new device was created (/dev/nvme1)
$ ls -l /dev/nvme*
crw------- 1 root root 240,  0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259,  0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259,  1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259,  2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259,  3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259,  4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259,  5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240,  1 Feb 26 11:16 /dev/nvme1
crw------- 1 root root  10, 33 Feb 26 11:11 /dev/nvme-fabrics

4) Confirm that we can retrieve log pages from persistent /dev/nvme1
$ sudo nvme discover -d nvme1 -o json
{
  "device" : "nvme1",
  "genctr" : 4,
  "records" : [
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage1",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    },
    {
      "trtype" : "tcp",
      "adrfam" : "ipv4",
      "subtype" : "nvme subsystem",
      "treq" : "not specified, sq flow control disable supported",
      "portid" : 1,
      "trsvcid" : "8009",
      "subnqn" : "storage2",
      "traddr" : "127.0.0.1",
      "sectype" : "none"
    }
  ]
}

5) Remove all nvmet configuration
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage1
# rm -f /sys/kernel/config/nvmet/ports/1/subsystems/storage2
# rmdir /sys/kernel/config/nvmet/ports/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2/namespaces/1
# rmdir /sys/kernel/config/nvmet/subsystems/storage1
# rmdir /sys/kernel/config/nvmet/subsystems/storage2

6) Confirm that the persistent device /dev/nvme1 is still there
$ ls -l /dev/nvme*
crw------- 1 root root 240,  0 Feb 26 11:09 /dev/nvme0
brw-rw---- 1 root disk 259,  0 Feb 26 11:09 /dev/nvme0n1
brw-rw---- 1 root disk 259,  1 Feb 26 11:09 /dev/nvme0n1p1
brw-rw---- 1 root disk 259,  2 Feb 26 11:09 /dev/nvme0n1p2
brw-rw---- 1 root disk 259,  3 Feb 26 11:09 /dev/nvme0n1p3
brw-rw---- 1 root disk 259,  4 Feb 26 11:09 /dev/nvme0n1p4
brw-rw---- 1 root disk 259,  5 Feb 26 11:09 /dev/nvme0n1p5
crw------- 1 root root 240,  1 Feb 26 11:32 /dev/nvme1    <- Should this device remain if the connection is lost?
crw------- 1 root root  10, 33 Feb 26 11:11 /dev/nvme-fabrics


7) Try to retrieve log pages from /dev/nvme1
$ sudo nvme discover --device nvme1 -o json

8) !!!! COMPUTER IS BRICKED !!!!

Regards,

Martin Belanger
Dell Inc.



More information about the Linux-nvme mailing list