nvme-pci: Disabling device after reset failure: -5 occurs while AER recovery

Wed Mar 8 19:34:58 PST 2023

On 3/7/23 03:59, Sagi Grimberg wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 3/2/23 02:09, Tushar Dave wrote:
>> Hi,
>>
>> We are observing NVMe device disabled due to reset failure after
>> injecting Malformed TLP. DPC/AER recovery succeed but NVMe fails.
>> I tried this on 2 different system and it is 100% reproducible with 6.2
>> kernel.
>>
>> On my system, Samsung NVMe SSD Controller PM173X is directly behind the
>> Broadcom PCIe Switch Downstream Port.
>> MalformedTLP is injected by changing MaxPayload Size(MPS) of PCIe switch
>> to 128B (keeping NVMe device MPS 512B).
>>
>> e.g.
>> # change MPS of PCIe switch (a9:10.0)
>> $ setpci -v -s a9:10.0 cap_exp+0x8.w
>> 0000:a9:10.0 (cap 10 @68) @70 = 0857
>> $ setpci -v -s a9:10.0 cap_exp+0x8.w=0x0817
>> 0000:a9:10.0 (cap 10 @68) @70 0817
>> $ lspci -s a9:10.0 -vvv | grep -w DevCtl -A 2
>>          DevCtl:    CorrErr+ NonFatalErr+ FatalErr+ UnsupReq-
>>              RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>              MaxPayload 128 bytes, MaxReadReq 128 bytes
>>
>> # run some traffic on nvme (ab:00.0)
>> $ dd if=/dev/nvme0n1 of=/tmp/test bs=4K
>> dd: error reading '/dev/nvme0n1': Input/output error
>> 2+0 records in
>> 2+0 records out
>> 8192 bytes (8.2 kB, 8.0 KiB) copied, 0.0115304 s, 710 kB/s
>>
>> #kernel log:
>> [  163.034889] pcieport 0000:a5:01.0: EDR: EDR event received
>> [  163.041671] pcieport 0000:a5:01.0: EDR: Reported EDR dev: 0000:a9:10.0
>> [  163.049071] pcieport 0000:a9:10.0: DPC: containment event,
>> status:0x2009 source:0x0000
>> [  163.058014] pcieport 0000:a9:10.0: DPC: unmasked uncorrectable error
>> detected
>> [  163.066081] pcieport 0000:a9:10.0: PCIe Bus Error:
>> severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
>> [  163.078151] pcieport 0000:a9:10.0:   device [1000:c030] error
>> status/mask=00040000/00180000
>> [  163.087613] pcieport 0000:a9:10.0:    [18] MalfTLP
>> (First)
>> [  163.095281] pcieport 0000:a9:10.0: AER:   TLP Header: 60000080
>> ab0000ff 00000001 d1fd0000
>> [  163.104517] pcieport 0000:a9:10.0: AER: broadcast error_detected message
>> [  163.112095] nvme nvme0: frozen state error detected, reset controller
>> [  163.150716] nvme0c0n1: I/O Cmd(0x2) @ LBA 16, 32 blocks, I/O Error
>> (sct 0x3 / sc 0x71)
>> [  163.159802] I/O error, dev nvme0c0n1, sector 16 op 0x0:(READ) flags
>> 0x4080700 phys_seg 4 prio class 2
>> [  163.383661] pcieport 0000:a9:10.0: AER: broadcast slot_reset message
>> [  163.390895] nvme nvme0: restart after slot reset
>> [  163.396230] nvme 0000:ab:00.0: restoring config space at offset 0x3c
>> (was 0x100, writing 0x1ff)
>> [  163.406079] nvme 0000:ab:00.0: restoring config space at offset 0x30
>> (was 0x0, writing 0xe0600000)
>> [  163.416212] nvme 0000:ab:00.0: restoring config space at offset 0x10
>> (was 0x4, writing 0xe0710004)
>> [  163.426326] nvme 0000:ab:00.0: restoring config space at offset 0xc
>> (was 0x0, writing 0x8)
>> [  163.435666] nvme 0000:ab:00.0: restoring config space at offset 0x4
>> (was 0x100000, writing 0x100546)
>> [  163.446026] pcieport 0000:a9:10.0: AER: broadcast resume message
>> [  163.468311] nvme 0000:ab:00.0: saving config space at offset 0x0
>> (reading 0xa824144d)
>> [  163.477209] nvme 0000:ab:00.0: saving config space at offset 0x4
>> (reading 0x100546)
>> [  163.485876] nvme 0000:ab:00.0: saving config space at offset 0x8
>> (reading 0x1080200)
>> [  163.495399] nvme 0000:ab:00.0: saving config space at offset 0xc
>> (reading 0x8)
>> [  163.504149] nvme 0000:ab:00.0: saving config space at offset 0x10
>> (reading 0xe0710004)
>> [  163.513596] nvme 0000:ab:00.0: saving config space at offset 0x14
>> (reading 0x0)
>> [  163.522310] nvme 0000:ab:00.0: saving config space at offset 0x18
>> (reading 0x0)
>> [  163.531013] nvme 0000:ab:00.0: saving config space at offset 0x1c
>> (reading 0x0)
>> [  163.539704] nvme 0000:ab:00.0: saving config space at offset 0x20
>> (reading 0x0)
>> [  163.548353] nvme 0000:ab:00.0: saving config space at offset 0x24
>> (reading 0x0)
>> [  163.556983] nvme 0000:ab:00.0: saving config space at offset 0x28
>> (reading 0x0)
>> [  163.565615] nvme 0000:ab:00.0: saving config space at offset 0x2c
>> (reading 0xa80a144d)
>> [  163.574899] nvme 0000:ab:00.0: saving config space at offset 0x30
>> (reading 0xe0600000)
>> [  163.584215] nvme 0000:ab:00.0: saving config space at offset 0x34
>> (reading 0x40)
>> [  163.592941] nvme 0000:ab:00.0: saving config space at offset 0x38
>> (reading 0x0)
>> [  163.601554] nvme 0000:ab:00.0: saving config space at offset 0x3c
>> (reading 0x1ff)
>> [  210.089132] block nvme0n1: no usable path - requeuing I/O
>> [  223.776595] nvme nvme0: I/O 18 QID 0 timeout, disable controller
>> [  223.825236] nvme nvme0: Identify Controller failed (-4)
>> [  223.832145] nvme nvme0: Disabling device after reset failure: -5
> 
> At this point the device is not going to recover.
Yes, I agree.

I looked little bit more and found that nvme reset failure and second DPC, both 
were due to nvme_slot_reset() restoring MPS as part of pci_restore_state().

AFAICT, after the first DPC event occurs, nvme device MPS gets changed to 
_default_ value 128B (this is likely due to DPC link retraining). However as 
part of software AER recovery, nvme_slot_reset() restores device state, and that 
brings the nvme device MPS back to 512B. (MPS of PCIe switch a9:10.0 still 
remains at 128B).

At this point when nvme_reset_ctrl->nvme_reset_work() tries to enable the 
device, malformedTLP again getting generated and that causes second DPC, makes 
NVMe controller reset to fail as well.

Here is the log where I am dumping DEVCTL register of NVMe device in 
nvme_slot_reset() _before_ and _after_ the function pci_restore_state().

e.g.
[ 1037.601409] pcieport 0000:a5:01.0: EDR: EDR event received
[ 1037.608140] pcieport 0000:a5:01.0: EDR: Reported EDR dev: 0000:a9:10.0
[ 1037.615534] pcieport 0000:a9:10.0: DPC: containment event, status:0x2009 
source:0x0000
[ 1037.624477] pcieport 0000:a9:10.0: DPC: unmasked uncorrectable error detected
[ 1037.632549] pcieport 0000:a9:10.0: PCIe Bus Error: severity=Uncorrected 
(Fatal), type=Transaction Layer, (Receiver ID)
[ 1037.644654] pcieport 0000:a9:10.0:   device [1000:c030] error 
status/mask=00040000/00180000
[ 1037.654084] pcieport 0000:a9:10.0:    [18] MalfTLP                (First)
[ 1037.661755] pcieport 0000:a9:10.0: AER:   TLP Header: 60000080 ab0000ff 
00000001 48046000
[ 1037.671001] pcieport 0000:a9:10.0: AER: broadcast error_detected message
[ 1037.678587] nvme nvme0: frozen state error detected, reset controller
[ 1037.719590] nvme0c0n1: I/O Cmd(0x2) @ LBA 16, 32 blocks, I/O Error (sct 0x3 / 
sc 0x71)
[ 1037.728673] I/O error, dev nvme0c0n1, sector 16 op 0x0:(READ) flags 0x4080700 
phys_seg 4 prio class 2
[ 1037.952257] pcieport 0000:a9:10.0: AER: broadcast slot_reset message
[ 1037.959479] nvme nvme0: restart after slot reset
[ 1037.965882] nvme 0000:ab:00.0: nvme_slot_reset: before pci_restore_state 
DEVCTL: 0x2910 <<<<<<<<<<<<<
[ 1037.975846] nvme 0000:ab:00.0: restoring config space at offset 0x3c (was 
0x100, writing 0x1ff)
[ 1037.986313] nvme 0000:ab:00.0: restoring config space at offset 0x30 (was 
0x0, writing 0xe0600000)
[ 1037.996955] nvme 0000:ab:00.0: restoring config space at offset 0x10 (was 
0x4, writing 0xe0710004)
[ 1038.007523] nvme 0000:ab:00.0: restoring config space at offset 0xc (was 0x0, 
writing 0x8)
[ 1038.017312] nvme 0000:ab:00.0: restoring config space at offset 0x4 (was 
0x100000, writing 0x100546)
[ 1038.028111] nvme 0000:ab:00.0: nvme_slot_reset: after pci_restore_state, 
DEVCTL: 0x5957  <<<<<<<<<<<<<<
[ 1038.037582] pcieport 0000:a9:10.0: AER: broadcast resume message
[ 1038.076807] nvme 0000:ab:00.0: saving config space at offset 0x0 (reading 
0xa824144d)
  0x40)
...
..
.
[ 1038.202291] nvme 0000:ab:00.0: saving config space at offset 0x38 (reading 0x0)
[ 1038.210885] nvme 0000:ab:00.0: saving config space at offset 0x3c (reading 0x1ff)
[ 1055.325069] block nvme0n1: no usable path - requeuing I/O
[ 1100.113066] nvme nvme0: I/O 21 QID 0 timeout, disable controller
[ 1100.169808] nvme nvme0: Identify Controller failed (-4)
[ 1100.176623] nvme nvme0: Disabling device after reset failure: -5
...
..

followed by second DPC..

-Tushar

> 
> 
>> [  223.876833] Buffer I/O error on dev nvme0n1, logical block 2, async
>> page read
>> [  223.876939] pcieport 0000:a9:10.0: AER: device recovery successful
>> [  223.893404] pcieport 0000:a9:10.0: EDR: DPC port successfully recovered
>> [  223.901469] pcieport 0000:a5:01.0: EDR: Status for 0000:a9:10.0: 0x80
>> [  223.938902] pcieport 0000:a5:01.0: EDR: EDR event received
>> [  223.946077] pcieport 0000:a5:01.0: EDR: Reported EDR dev: 0000:a9:10.0
>> [  223.953901] pcieport 0000:a9:10.0: DPC: containment event,
>> status:0x2009 source:0x0000
>> [  223.963243] pcieport 0000:a9:10.0: DPC: unmasked uncorrectable error
>> detected
>> [  223.971691] pcieport 0000:a9:10.0: PCIe Bus Error:
>> severity=Uncorrected (Fatal), type=Transaction Layer, (Receiver ID)
>> [  223.984144] pcieport 0000:a9:10.0:   device [1000:c030] error
>> status/mask=00040000/00180000
>> [  223.993966] pcieport 0000:a9:10.0:    [18] MalfTLP
>> (First)
>> [  224.002023] pcieport 0000:a9:10.0: AER:   TLP Header: 60000080
>> ab0000ff 00000001 1e659000
>> [  224.011644] pcieport 0000:a9:10.0: AER: broadcast error_detected message
>> [  224.019604] nvme nvme0: frozen state error detected, reset controller
>> [  224.236597] pcieport 0000:a9:10.0: AER: broadcast slot_reset message
>> [  224.244676] nvme nvme0: restart after slot reset
>> [  224.250584] nvme 0000:ab:00.0: restoring config space at offset 0x3c
>> (was 0x100, writing 0x1ff)
>> [  224.260945] nvme 0000:ab:00.0: restoring config space at offset 0x30
>> (was 0x0, writing 0xe0600000)
>> [  224.271460] nvme 0000:ab:00.0: restoring config space at offset 0x10
>> (was 0x4, writing 0xe0710004)
>> [  224.282012] nvme 0000:ab:00.0: restoring config space at offset 0xc
>> (was 0x0, writing 0x8)
>> [  224.291713] nvme 0000:ab:00.0: restoring config space at offset 0x4
>> (was 0x100000, writing 0x100546)
>> [  224.302430] pcieport 0000:a9:10.0: AER: broadcast resume message
>> [  224.309618] pcieport 0000:a9:10.0: AER: device recovery successful
>> [  224.316968] pcieport 0000:a9:10.0: EDR: DPC port successfully recovered
>> [  224.324865] pcieport 0000:a5:01.0: EDR: Status for 0000:a9:10.0: 0x80
>>
>> After test, NVMe device still shows up in lspci and I can read pcie
>> config space but cannot read/write from/to the nvme controller.
>>
>> I am trying to narrow down but would appreciate help from linux-nvme.
> 
> The issue is that during the reset the controller failed to identify
> the controller, due to a second failure. The nvme controller does not
> know how to handle that.