[PATCH v3 1/1] nvme-pci : Fix EEH failure on ppc after subsystem reset

Hannes Reinecke hare at suse.de
Fri Jun 14 02:51:14 PDT 2024


On 6/4/24 11:10, Nilay Shroff wrote:
> The NVMe subsystem reset command when executed may cause the loss of
> the NVMe adapter communication with kernel. And the only way today
> to recover the adapter is to either re-enumerate the pci bus or
> hotplug NVMe disk or reboot OS.
> 
> The PPC architecture supports mechanism called EEH (enhanced error
> handling) which allows pci bus errors to be cleared and a pci card to
> be rebooted, without having to physically hotplug NVMe disk or reboot
> the OS.
> 
> In the current implementation when user executes the nvme subsystem
> reset command and if kernel loses the communication with NVMe adapter
> then subsequent read/write to the PCIe config space of the device
> would fail. Failing to read/write to PCI config space makes NVMe
> driver assume the permanent loss of communication with the device and
> so driver marks the NVMe controller dead and frees all resources
> associate to that controller. As the NVMe controller goes dead, the
> EEH recovery can't succeed.
> 
> This patch helps fix this issue so that after user executes subsystem
> reset command if the communication with the NVMe adapter is lost and
> EEH recovery is initiated then we allow the EEH recovery to forward
> progress and gives the EEH thread a fair chance to recover the
> adapter. If in case, the EEH thread couldn't recover the adapter
> communication then it sets the pci channel state of the erring
> adapter to "permanent failure" and removes the device.
> 
> Signed-off-by: Nilay Shroff <nilay at linux.ibm.com>
> ---
>   drivers/nvme/host/core.c |  1 +
>   drivers/nvme/host/pci.c  | 21 ++++++++++++++++++---
>   2 files changed, 19 insertions(+), 3 deletions(-)
> 
Looks odd, but I cannot see a better way out here.

Reviewed-by: Hannes Reinecke <hare at suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich




More information about the Linux-nvme mailing list