[PATCH] nvme/pci: Sync controller reset for AER slot_reset

Ming Lei tom.leiming at gmail.com
Sat May 12 02:27:01 PDT 2018


On Fri, May 11, 2018 at 2:56 AM, Alex G. <mr.nuke.me at gmail.com> wrote:
>
>
> On 05/10/2018 11:01 AM, Keith Busch wrote:
>> AER handling expects a successful return from slot_reset means the
>> driver made the device functional again. The nvme driver had been using
>> an asynchronous reset to recover the device, so the device
>> may still be initializing after control is returned to the
>> AER handler. This creates problems for subsequent event handling,
>> causing the initializion to fail.
>>
>> This patch fixes that by syncing the controller reset before returning
>> to the AER driver, and reporting the true state of the reset.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=199657
>> Reported-by: Alex Gagniuc <mr.nuke.me at gmail.com>
>
> Tested-by: Alex Gagniuc <mr.nuke.me at gmail.com>
>
> Sponsored-by: DellEMC
> You know I had to add that plug somewhere :p
>
>> Cc: Sinan Kaya <okaya at codeaurora.org>
>> Cc: Bjorn Helgaas <bhelgaas at google.com>
>> Cc: <stable at vger.kernel.org>
>> Signed-off-by: Keith Busch <keith.busch at intel.com>
>> ---
>>  drivers/nvme/host/pci.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index b542dce45927..2e221796257a 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -2681,8 +2681,15 @@ static pci_ers_result_t nvme_slot_reset(struct pci_dev *pdev)
>>
>>       dev_info(dev->ctrl.device, "restart after slot reset\n");
>>       pci_restore_state(pdev);
>> -     nvme_reset_ctrl(&dev->ctrl);
>> -     return PCI_ERS_RESULT_RECOVERED;
>> +     nvme_reset_ctrl_sync(&dev->ctrl);
>
> This does wonders when nvme_reset_ctrl_sync() returns in a timely
> manner. I was also able to get the nvme drive in a state where
> nvme_reset_ctrl_sync() does not return. Then we end up with the device
> lock in report_slot_reset, which, as you may imagine, is not a great thing.
>
> I think this step is a move in the better direction, but we still have
> problems.

If IOs from nvme_reset_work() times out, nvme_reset_ctrl_sync()
may never return, but not sure if that is your case.

You may find where it hangs via 'ps -ax | grep D' and cat /proc/$PID/stack.

-- 
Ming Lei



More information about the Linux-nvme mailing list