[PATCH] nvme-pci: fix resume after AER recovery
Javier.gonz at samsung.com
Javier.gonz at samsung.com
Mon Feb 6 06:02:20 PST 2023
On 03.02.2023 18:45, Grochowski, Maciej wrote:
>> > I run remove/rescan for this Samsung PM1733 and it looks like it works
>> > fine on both 5.15 and 6.1.9
>>
>> Sounds like the Samsung wants a longer, non-standard delay between SBR and reinit.
>
>Thanks for the suggestion.
>
>Hi Javier:
>
>We have 2 Samsung NVMe drives: PM9A3 and PM1733
>When we issue fatal AER via aer_inject these driver are not able to recover due to the
>"Unable to change power state from D3cold to D0, device inaccessible"
>
>Repeated log from previous mail (this is consistent behavior on 5.15 and 6.1 kernel)
>```
>[ 334.527200] pcieport 0000:00:03.4: aer_inject: Injecting errors 00000000/00004000 into device 0000:0b:00.0
>[ 334.537072] pcieport 0000:00:03.4: AER: Uncorrected (Fatal) error received: 0000:0b:00.0
>[ 334.545320] nvme 0000:0b:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
>[ 334.556682] pcieport 0000:00:03.4: AER: broadcast error_detected message
>[ 334.563467] nvme nvme5: frozen state error detected, reset controller
>[ 334.615434] pcieport 0000:00:03.4: pciehp: pciehp_reset_slot: SLOTCTRL 70 write cmd 0
>[ 335.655445] pcieport 0000:00:03.4: pciehp: pciehp_reset_slot: SLOTCTRL 70 write cmd 1008
>[ 335.663647] pcieport 0000:00:03.4: AER: Root Port link has been reset (0)
>[ 335.670523] pcieport 0000:00:03.4: AER: broadcast slot_reset message
>[ 335.676954] nvme nvme5: restart after slot reset
>[ 335.684371] nvme 0000:0b:00.0: restoring config space at offset 0x3c (was 0xffffffff, writing 0x10a)
>[ 336.427724] nvme 0000:0b:00.0: restoring config space at offset 0x8 (was 0xffffffff, writing 0x1080200)
>[ 336.437510] nvme 0000:0b:00.0: restoring config space at offset 0x4 (was 0xffffffff, writing 0x100406)
>[ 336.447215] nvme 0000:0b:00.0: restoring config space at offset 0x0 (was 0xffffffff, writing 0xa824144d)
>[ 336.457117] pcieport 0000:00:03.4: AER: broadcast resume message
>[ 336.479575] nvme 0000:0b:00.0: Unable to change power state from D3cold to D0, device inaccessible
>[ 336.494264] nvme nvme5: Removing after probe failure status: -19
>[ 336.535861] pcieport 0000:00:03.4: AER: device recovery successful
>[ 336.535899] nvme 0000:0b:00.0: vgaarb: pci_notify
>[ 336.691465] pci 0000:0b:00.0: vgaarb: pci_notify
>```
>
>Same experiment for other NVMe vendors seems to works fine (I tried on KIOXIA NVME)
>is that something you can take a look at?
Thanks for the note Maciej. I will report this internally.
Keith, Christoph,
Is there a chance we can get a quirk for this for this FW. Seems like an
issue on our side that is creating problems.
Thanks,
Javier
More information about the Linux-nvme
mailing list