[PATCH] nvme-pci: fix resume after AER recovery
Grochowski, Maciej
Maciej.Grochowski at sony.com
Mon Jan 30 10:54:32 PST 2023
The issue was spotted on 5.10 LTS, I checked the sources and indeed sources are before 387c72cd commit.
Let me update that and retest.
-----Original Message-----
From: Keith Busch <kbusch at kernel.org>
Sent: Monday, January 30, 2023 10:43 AM
To: Christoph Hellwig <hch at lst.de>
Cc: sagi at grimberg.me; linux-nvme at lists.infradead.org; Grochowski, Maciej <Maciej.Grochowski at sony.com>
Subject: Re: [PATCH] nvme-pci: fix resume after AER recovery
On Mon, Jan 30, 2023 at 11:35:36AM -0700, Keith Busch wrote:
> On Mon, Jan 30, 2023 at 11:14:49AM +0100, Christoph Hellwig wrote:
> > All I/O on a nvme controllers hangs after injecting a malformed TLP
> > error using aer-inject with an error file like:
> >
> > --- snip ---
> > AER
> > PCI_ID WWWW:XX.YY.Z
> > UNCOR_STATUS COMP_TIME
> > HEADER_LOG 0 1 2 3
> > --- snip ---
> >
> > This is because in this case the ->resume method will be called
> > after
> > ->error_injected and not ->slot_reset, leaving the controller in
> > ->disabled
> > state and the queue frozen. Fix this by doing a controller reset to
> > resume as well.
>
> Why isn't slot_reset being called after error_detected? Driver should
> be returning "RESULT_NEEDS_RESET", which should have the pcie error
> handling always invoke the slot_reset callback.
Are you using an older kernel that doesn't have
https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=387c72cdd7fb6bef650fb078d0f6ae9682abf631__;!!JmoZiZGBv3RvKRSx!40iIn65MdrKZ0SfOKFPZh2uzo1KyAjcza3Tj7fDDip143yds9jH361GQ09RcoZbiYiX0ot8uZYS5-Q2cGWE6$
?
More information about the Linux-nvme
mailing list