[PATCH] nvme-pci: fix resume after AER recovery
Keith Busch
kbusch at kernel.org
Mon Jan 30 10:35:36 PST 2023
On Mon, Jan 30, 2023 at 11:14:49AM +0100, Christoph Hellwig wrote:
> All I/O on a nvme controllers hangs after injecting a malformed TLP error
> using aer-inject with an error file like:
>
> --- snip ---
> AER
> PCI_ID WWWW:XX.YY.Z
> UNCOR_STATUS COMP_TIME
> HEADER_LOG 0 1 2 3
> --- snip ---
>
> This is because in this case the ->resume method will be called after
> ->error_injected and not ->slot_reset, leaving the controller in disabled
> state and the queue frozen. Fix this by doing a controller reset to
> resume as well.
Why isn't slot_reset being called after error_detected? Driver should be
returning "RESULT_NEEDS_RESET", which should have the pcie error handling
always invoke the slot_reset callback.
More information about the Linux-nvme
mailing list