[PATCH] nvme/pci: Sync controller reset for AER slot_reset

Alex G. mr.nuke.me at gmail.com
Thu May 10 12:20:34 PDT 2018



On 05/10/2018 02:14 PM, Keith Busch wrote:
> On Thu, May 10, 2018 at 01:56:56PM -0500, Alex G. wrote:
>>> @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
>>>  
>>>  	dev_info(dev->ctrl.device, "restart after slot reset\n");
>>>  	pci_restore_state(pdev);
>>> -	nvme_reset_ctrl(&dev->ctrl);
>>> -	return PCI_ERS_RESULT_RECOVERED;
>>> +	nvme_reset_ctrl_sync(&dev->ctrl);
>>
>> This does wonders when nvme_reset_ctrl_sync() returns in a timely
>> manner. I was also able to get the nvme drive in a state where
>> nvme_reset_ctrl_sync() does not return. Then we end up with the device
>> lock in report_slot_reset, which, as you may imagine, is not a great thing.
> 
> It never returns? That shouldn't happen. There are cases where it may take
> a very long time, depending on what the controller reports in CAP.TO. The
> only other case it may stall is if the controller never responds to the
> initialization admin commands, but that should delay by 60 seconds under
> default parameters.

Took 28 minutes before I gave up and rebooted the machine. Maybe I
should have waited 30.
Even 60 seconds seems like a terribly long time to wait in AER. Simple
stuff like block IO and 'nvme list' hangs in kernel space this entire
time. I can raise a separate issue once I find a reliable way to repro.

Alex



More information about the Linux-nvme mailing list