[PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset
Nilay Shroff
nilay at linux.ibm.com
Sat Mar 9 11:05:06 PST 2024
On 3/9/24 21:14, Keith Busch wrote:
> On Sat, Mar 09, 2024 at 07:59:11PM +0530, Nilay Shroff wrote:
>> On 3/8/24 21:11, Keith Busch wrote:
>>> On Fri, Feb 09, 2024 at 10:32:16AM +0530, Nilay Shroff wrote:
>>>> @@ -2776,6 +2776,14 @@ static void nvme_reset_work(struct work_struct *work)
>>>> out_unlock:
>>>> mutex_unlock(&dev->shutdown_lock);
>>>> out:
>>>> + /*
>>>> + * If PCI recovery is ongoing then let it finish first
>>>> + */
>>>> + if (pci_channel_offline(to_pci_dev(dev->dev))) {
>>>> + dev_warn(dev->ctrl.device, "PCI recovery is ongoing so let it finish\n");
>>>> + return;
>>>> + }
>>>> +
>>>> /*
>>>> * Set state to deleting now to avoid blocking nvme_wait_reset(), which
>>>> * may be holding this pci_dev's device lock.
>>>> @@ -3295,9 +3303,11 @@ static pci_ers_result_t nvme_error_detected(struct pci_dev *pdev,
>>>> case pci_channel_io_frozen:
>>>> dev_warn(dev->ctrl.device,
>>>> "frozen state error detected, reset controller\n");
>>>> - if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>>>> - nvme_dev_disable(dev, true);
>>>> - return PCI_ERS_RESULT_DISCONNECT;
>>>> + if (nvme_ctrl_state(&dev->ctrl) != NVME_CTRL_RESETTING) {
>>>> + if (!nvme_change_ctrl_state(&dev->ctrl, NVME_CTRL_RESETTING)) {
>>>> + nvme_dev_disable(dev, true);
>>>> + return PCI_ERS_RESULT_DISCONNECT;
>>>> + }
>>>> }
>>>> nvme_dev_disable(dev, false);
>>>> return PCI_ERS_RESULT_NEED_RESET;
>>>
>>> I get what you're trying to do, but it looks racy. The reset_work may
>>> finish before pci sets channel offline, or the error handling work
>>> happens to see RESETTING state, but then transitions to CONNECTING state
>>> after and deadlocks on the '.resume()' side. You are counting on a very
>>> specific sequence tied to the PCIe error handling module, and maybe you
>>> are able to count on that sequence for your platform in this unique
>>> scenario, but these link errors could happen anytime.
>>>
>> I am not sure about the deadlock in '.resume()' side you mentioned above.
>> Did you mean that deadlock occur due to someone holding this pci_dev's device lock?
>> Or deadlock occur due to the flush_work() from nvme_error_resume() would never
>> return?
>
> Your patch may observe a ctrl in "RESETTING" state from
> error_detected(), then disable the controller, which quiesces the admin
> queue. Meanwhile, reset_work may proceed to CONNECTING state and try
> nvme_submit_sync_cmd(), which blocks forever because no one is going to
> unquiesce that admin queue.
>
OK I think I got your point. However, it seems that even without my patch
the above mentioned deadlock could still be possible.
Without my patch, if error_detcted() observe a ctrl in "RESETTING" state then
it still invokes nvme_dev_disable(). The only difference with my patch is that
error_detected() returns the PCI_ERS_RESULT_NEED_RESET instead of PCI_ERS_RESULT_DISCONNECT.
Regarding the deadlock, it appears to me that reset_work races with nvme_dev_disable()
and we may want to extend the shutdown_lock in reset_work so that nvme_dev_disable()
can't interfere with admin queue while reset_work accesses the admin queue. I think
we can fix this case. I would send PATCH v2 with this fix for review, however, please let
me know if you have any other concern before I spin a new patch.
Thanks,
--Nilay
More information about the Linux-nvme
mailing list