[PATCH v3] nvme/pci: Log PCI_STATUS when the controller dies

Fri Dec 2 14:48:06 PST 2016

On Fri, Dec 02, 2016 at 08:58:57AM -0800, Andy Lutomirski wrote:
> When debugging nvme controller crashes, it's nice to know whether
> the controller died cleanly so that the failure is just reflected in
> CSTS, it died and put an error in PCI_STATUS, or whether it died so
> badly that it stopped responding to PCI configuration space reads.
> 
> I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung
> "SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q".
> 
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Andy Lutomirski <luto at kernel.org>

Totally fine with this, but just want to mention that even the MMIO read
has caused problems when racing a pciehp hot plug event. A config read in
this case is another opprotunity for a completion timeout, unless I can
get Bjorn to apply the patch series disabling config access on surprise
removed devices. Or maybe our nvme health check polling implementation
is misguided.

Reviewed-by: Keith Busch <keith.busch at intel.com>