[PATCH] nvme-pci: fix resume after AER recovery

Grochowski, Maciej Maciej.Grochowski at sony.com
Fri Feb 3 10:45:35 PST 2023


> > I run remove/rescan for this Samsung PM1733 and it looks like it works 
> > fine on both 5.15 and 6.1.9
>
> Sounds like the Samsung wants a longer, non-standard delay between SBR and reinit.

Thanks for the suggestion.

Hi Javier: 

We have 2 Samsung NVMe drives: PM9A3 and PM1733
When we issue fatal AER via aer_inject these driver are not able to recover due to the 
"Unable to change power state from D3cold to D0, device inaccessible"

Repeated log from previous mail (this is consistent behavior on 5.15 and 6.1 kernel)
```
[  334.527200] pcieport 0000:00:03.4: aer_inject: Injecting errors 00000000/00004000 into device 0000:0b:00.0
[  334.537072] pcieport 0000:00:03.4: AER: Uncorrected (Fatal) error received: 0000:0b:00.0
[  334.545320] nvme 0000:0b:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
[  334.556682] pcieport 0000:00:03.4: AER: broadcast error_detected message
[  334.563467] nvme nvme5: frozen state error detected, reset controller
[  334.615434] pcieport 0000:00:03.4: pciehp: pciehp_reset_slot: SLOTCTRL 70 write cmd 0
[  335.655445] pcieport 0000:00:03.4: pciehp: pciehp_reset_slot: SLOTCTRL 70 write cmd 1008
[  335.663647] pcieport 0000:00:03.4: AER: Root Port link has been reset (0)
[  335.670523] pcieport 0000:00:03.4: AER: broadcast slot_reset message
[  335.676954] nvme nvme5: restart after slot reset
[  335.684371] nvme 0000:0b:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x10a)
[  336.427724] nvme 0000:0b:00.0: restoring config space at offset 0x8 (was 0xffffffff, writing 0x1080200)
[  336.437510] nvme 0000:0b:00.0: restoring config space at offset 0x4 (was 0xffffffff, writing 0x100406)
[  336.447215] nvme 0000:0b:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0xa824144d)
[  336.457117] pcieport 0000:00:03.4: AER: broadcast resume message
[  336.479575] nvme 0000:0b:00.0: Unable to change power state from D3cold to D0, device inaccessible
[  336.494264] nvme nvme5: Removing after probe failure status: -19
[  336.535861] pcieport 0000:00:03.4: AER: device recovery successful
[  336.535899] nvme 0000:0b:00.0: vgaarb: pci_notify
[  336.691465] pci 0000:0b:00.0: vgaarb: pci_notify
```

Same experiment for other NVMe vendors seems to works fine (I tried on KIOXIA NVME)
is that something you can take a look at?



More information about the Linux-nvme mailing list