[PATCH] nvme-pci: fix resume after AER recovery
Klaus Jensen
its at irrelevant.dk
Tue Feb 7 02:36:32 PST 2023
On Feb 7 09:29, Javier.gonz at samsung.com wrote:
> On 07.02.2023 01:51, Grochowski, Maciej wrote:
> > I have tried suggested approach, with some modification: pci_device in
> > pci_reset_secondary_bus is actually the bridge not NVMe device itself,
> > thus I checked devices behind that bridge to see if any has D0 bit and
> > base on that logic I run the custom delay.
> >
> > Unfortunately even with this approach I see the same issue for both
> > Samsung drives, and based on kernel logs I can see that wait for
> > secondary bus reset get increased. Thus seems like this quirk don't
> > work for some reason. (I tried also increasing delays to different
> > values but it didn't work).
>
> Too bad.
>
> I will write you separately to get som dumps from the device. We have
> not seen this before, so we need to understand this a bit better.
>
> Regarding the quirk, we are looking into it. Will come with something in
> this thread later. Cc'ing Kanchan and Klaus.
>
I dug up a PM1733 and I am not immediately able to reproduce on 6.2-rc7.
With an aer-inject error file,
AER
PCI_ID 0000:04:00.0
UNCOR_STATUS MALF_TLP
HEADER_LOG 0 1 2 3
I'm getting a Fatal error with type "Inaccessible, (Unregistered Agent
ID)", but it still recovers successfully:
pcieport 0000:00:01.2: aer_inject: Injecting errors 00000000/00040000 into device 0000:04:00.0
pcieport 0000:00:01.2: AER: Uncorrected (Fatal) error received: 0000:04:00.0
nvme 0000:04:00.0: AER: PCIe Bus Error: severity=Uncorrected (Fatal), type=Inaccessible, (Unregistered Agent ID)
nvme nvme1: frozen state error detected, reset controller
pcieport 0000:03:00.0: AER: Downstream Port link has been reset (0)
nvme nvme1: restart after slot reset
nvme nvme1: Shutdown timeout set to 10 seconds
nvme nvme1: 32/0/0 default/read/poll queues
pcieport 0000:03:00.0: AER: device recovery successful
Maciej, can you share firmware revision information and a bit more
details on your reproducer/setup that might allow us to replicate?
Thanks,
Klaus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20230207/fa47af0d/attachment-0001.sig>
More information about the Linux-nvme
mailing list