[PATCH] nvme-pci: fix resume after AER recovery

Grochowski, Maciej Maciej.Grochowski at sony.com
Mon Jan 30 10:54:32 PST 2023


The issue was spotted on 5.10 LTS, I checked the sources and indeed sources are before 387c72cd commit.
Let me update that and retest.

-----Original Message-----
From: Keith Busch <kbusch at kernel.org> 
Sent: Monday, January 30, 2023 10:43 AM
To: Christoph Hellwig <hch at lst.de>
Cc: sagi at grimberg.me; linux-nvme at lists.infradead.org; Grochowski, Maciej <Maciej.Grochowski at sony.com>
Subject: Re: [PATCH] nvme-pci: fix resume after AER recovery

On Mon, Jan 30, 2023 at 11:35:36AM -0700, Keith Busch wrote:
> On Mon, Jan 30, 2023 at 11:14:49AM +0100, Christoph Hellwig wrote:
> > All I/O on a nvme controllers hangs after injecting a malformed TLP 
> > error using aer-inject with an error file like:
> > 
> > --- snip ---
> > AER
> > PCI_ID WWWW:XX.YY.Z
> > UNCOR_STATUS COMP_TIME
> > HEADER_LOG 0 1 2 3
> > --- snip ---
> > 
> > This is because in this case the ->resume method will be called 
> > after
> > ->error_injected and not ->slot_reset, leaving the controller in 
> > ->disabled
> > state and the queue frozen.  Fix this by doing a controller reset to 
> > resume as well.
> 
> Why isn't slot_reset being called after error_detected? Driver should 
> be returning "RESULT_NEEDS_RESET", which should have the pcie error 
> handling always invoke the slot_reset callback.

Are you using an older kernel that doesn't have

  https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=387c72cdd7fb6bef650fb078d0f6ae9682abf631__;!!JmoZiZGBv3RvKRSx!40iIn65MdrKZ0SfOKFPZh2uzo1KyAjcza3Tj7fDDip143yds9jH361GQ09RcoZbiYiX0ot8uZYS5-Q2cGWE6$ 

?



More information about the Linux-nvme mailing list