[BUG][6.2.11] WD SN770 nvme controller is down

Lyndon Sanche lsanche at lyndeno.ca
Fri Apr 21 15:21:36 PDT 2023


Hello:

Apologies if this is not the proper channel for this kind of message. I 
have seen similar messages on here about similar issues so I thought I 
would send it here.

I have recently replaced the drive in my laptop (Dell XPS 15 9560) with 
a WD_BLACK SN770 2TB. I also have this exact drive in my desktop.

On my laptop I will randomly get the following in the kernel log:
[ 1753.922566] nvme nvme0: controller is down; will reset: 
CSTS=0xffffffff, PCI_STATUS=0x10
[ 1753.922574] nvme nvme0: Does your device have a faulty power saving 
mode enabled?
[ 1753.922578] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 
pcie_aspm=off" and report a bug
[ 1753.940085] nvme0n1: I/O Cmd(0x2) @ LBA 6278080, 187 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940103] I/O error, dev nvme0n1, sector 50224640 op 0x0:(READ) 
flags 0x84700 phys_seg 127 prio class 3
[ 1753.940124] nvme0n1: I/O Cmd(0x2) @ LBA 6278267, 229 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940133] I/O error, dev nvme0n1, sector 50226136 op 0x0:(READ) 
flags 0x84700 phys_seg 127 prio class 3
[ 1753.940143] nvme0n1: I/O Cmd(0x2) @ LBA 6278496, 3 blocks, I/O Error 
(sct 0x3 / sc 0x71)
[ 1753.940149] I/O error, dev nvme0n1, sector 50227968 op 0x0:(READ) 
flags 0x80700 phys_seg 2 prio class 3
[ 1753.940172] nvme0n1: I/O Cmd(0x2) @ LBA 6278499, 256 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940179] I/O error, dev nvme0n1, sector 50227992 op 0x0:(READ) 
flags 0x84700 phys_seg 102 prio class 3
[ 1753.940190] nvme0n1: I/O Cmd(0x2) @ LBA 6278755, 169 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940196] I/O error, dev nvme0n1, sector 50230040 op 0x0:(READ) 
flags 0x84700 phys_seg 127 prio class 3
[ 1753.940205] nvme0n1: I/O Cmd(0x2) @ LBA 6278924, 36 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940211] I/O error, dev nvme0n1, sector 50231392 op 0x0:(READ) 
flags 0x80700 phys_seg 28 prio class 3
[ 1753.940227] nvme0n1: I/O Cmd(0x2) @ LBA 6278961, 129 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940233] I/O error, dev nvme0n1, sector 50231688 op 0x0:(READ) 
flags 0x84700 phys_seg 127 prio class 3
[ 1753.940241] nvme0n1: I/O Cmd(0x2) @ LBA 6279090, 14 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940247] I/O error, dev nvme0n1, sector 50232720 op 0x0:(READ) 
flags 0x80700 phys_seg 14 prio class 3
[ 1753.940266] nvme0n1: I/O Cmd(0x2) @ LBA 6277056, 141 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940272] I/O error, dev nvme0n1, sector 50216448 op 0x0:(READ) 
flags 0x84700 phys_seg 127 prio class 3
[ 1753.940283] nvme0n1: I/O Cmd(0x2) @ LBA 6277197, 256 blocks, I/O 
Error (sct 0x3 / sc 0x71)
[ 1753.940289] I/O error, dev nvme0n1, sector 50217576 op 0x0:(READ) 
flags 0x84700 phys_seg 92 prio class 3
[ 1753.945614] nvme 0000:04:00.0: enabling device (0000 -> 0002)
[ 1753.945944] nvme nvme0: Disabling device after reset failure: -19

This happens when booting and sometimes when the system is just sitting 
there, whether idling or doing somewhat intensive tasks. My drive seems 
to reset and go "read-only" but neither reading nor writing work after 
and the laptop has to be hard-rebooted.

I cannot retrieve this log after the fact and can only obtain it if I 
leave "dmesg -w" running in a terminal until it occurs.

My online research comes up with a lot of info about ASPM and APST bugs 
and solutions, including for the original drive that came with this 
laptop.

Suggested solutions include setting pcie_aspm=off or 
nvme_core.default_ps_max_latency_us=0. I have tried one of these at a 
time, as well as both at once and in no combination does it prevent the 
issue from re-occurring.

Here is some info about the SSD power states (firmware version 
included, this was the latest when I updated through a windows-to-go 
install):
$ sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
...
fr : 731100WD
...
ps 0 : mp:5.40W operational enlat:0 exlat:0 rrt:0 rrl:0
            rwt:0 rwl:0 idle_power:0.3000W active_power:5.40W
            active_power_workload:80K 128KiB SW
ps 1 : mp:3.50W operational enlat:0 exlat:0 rrt:0 rrl:0
            rwt:0 rwl:0 idle_power:0.3000W active_power:3.00W
            active_power_workload:80K 128KiB SW
ps 2 : mp:2.40W operational enlat:0 exlat:0 rrt:0 rrl:0
            rwt:0 rwl:0 idle_power:0.3000W active_power:2.00W
            active_power_workload:80K 128KiB SW
ps 3 : mp:0.0150W non-operational enlat:1500 exlat:2500 rrt:3 rrl:3
            rwt:3 rwl:3 idle_power:0.0150W active_power:-
            active_power_workload:-
ps 4 : mp:0.0050W non-operational enlat:10000 exlat:6000 rrt:4 rrl:4
            rwt:4 rwl:4 idle_power:0.0050W active_power:-
            active_power_workload:-
ps 5 : mp:0.0033W non-operational enlat:176000 exlat:25000 rrt:5 rrl:5
            rwt:5 rwl:5 idle_power:0.0033W active_power:-
            active_power_workload:-

I am wondering if there are any other steps I can take to troubleshoot 
this problem. I have tried taking the drive back to the store I bought 
it from, but the tests they ran all passed so a replacement/return is 
not likely to be possible. I am wondering if there are other settings 
or possibly patches I could try.

It should be noted that none of these problems occur on my desktop with 
the same model of drive (ASUS X570 Motherboard).

I am grateful for any help that can be provided.

Thank you,

Lyndon Sanche






More information about the Linux-nvme mailing list