[PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset

Keith Busch kbusch at kernel.org
Tue Mar 12 07:30:07 PDT 2024


On Mon, Mar 11, 2024 at 06:28:21PM +0530, Nilay Shroff wrote:
> @@ -3295,10 +3304,13 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
>         case pci_channel_io_frozen:
>                 dev_warn(dev->ctrl.device,
>                         "frozen state error detected, reset controller\n");
> -               if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> -                       nvme_dev_disable(dev, true);
> -                       return PCI_ERS_RESULT_DISCONNECT;
> +               if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) {
> +                       if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
> +                               nvme_dev_disable(dev, true);
> +                               return PCI_ERS_RESULT_DISCONNECT;
> +                       }
>                 }
> +               flush_work(&dev->ctrl.reset_work);

I was messing with a similar idea. I wasn't sure if EEH calls the error
handler inline with the error, in which case this would try to flush the
work within the same work, which obviously doesn't work. As long as its
called from a different thread, then this should be fine.

>                 nvme_dev_disable(dev, false);
>                 return PCI_ERS_RESULT_NEED_RESET;
>         case pci_channel_io_perm_failure:
> 
> The flush_work() would ensure that we don't disable the ctrl if reset_work 
> is running. If the rest_work is *not* running currently then flush_work() should
> return immediately. Moreover, if reset_work is scheduled or start running after
> flush_work() returns then reset_work should not be able to get upto the CONNECTING
> state because pci recovery is in progress and so it should fail early.
> 
> On the reset_work side other than detecting pci error recovery, I think we also 
> need one another change where in case the ctrl state is set to CONNECTING and we 
> detect the pci error recovery in progress then before returning from the reset_work
> we set the ctrl state to RESETTING so that error_detected() could forward progress.
> The changes should be something as below:
> 
> @@ -2776,6 +2776,16 @@ static void nvme_reset_work(struct work_struct *work)
>   out_unlock:
>         mutex_unlock(&dev->shutdown_lock);
>   out:
> +       /*
> +        * If PCI recovery is ongoing then let it finish first
> +        */
> +       if (pci_channel_offline(to_pci_dev(dev->dev))) {
> +               dev_warn(dev->ctrl.device, "PCI recovery is ongoing so let it finish\n");
> +               if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING)
> +                       WRITE_ONCE(dev->ctrl.state, NVME_CTRL_RESETTING);

This may break the state machine, like if the device was hot removed
during all this error handling. This will force the state back to
RESETTING when it should be DEAD.

I think what you need is just allow a controller to reset from a
connecting state. Have to be careful that wouldn't break any other
expectations, though.



More information about the Linux-nvme mailing list