[PATCH RESEND] nvme-pci: Fix EEH failure on ppc after subsystem reset
Nilay Shroff
nilay at linux.ibm.com
Thu Feb 29 04:27:56 PST 2024
Hi Keith,
On 2/28/24 16:49, Nilay Shroff wrote:
>
>
> On 2/27/24 23:59, Keith Busch wrote:
>> On Fri, Feb 09, 2024 at 10:32:16AM +0530, Nilay Shroff wrote:
>>> If the nvme subsyetm reset causes the loss of communication to the nvme
>>> adapter then EEH could potnetially recover the adapter. The detection of
>>> comminication loss to the adapter only happens when the nvme driver
>>> attempts to read an MMIO register.
>>>
>>> The nvme subsystem reset command writes 0x4E564D65 to NSSR register and
>>> schedule adapter reset.In the case nvme subsystem reset caused the loss
>>> of communication to the nvme adapter then either IO timeout event or
>>> adapter reset handler could detect it. If IO timeout even could detect
>>> loss of communication then EEH handler is able to recover the
>>> communication to the adapter. This change was implemented in 651438bb0af5
>>> (nvme-pci: Fix EEH failure on ppc). However if the adapter communication
>>> loss is detected in nvme reset work handler then EEH is unable to
>>> successfully finish the adapter recovery.
>>>
>>> This patch ensures that,
>>> - nvme driver reset handler would observer pci channel was offline after
>>> a failed MMIO read and avoids marking the controller state to DEAD and
>>> thus gives a fair chance to EEH handler to recover the nvme adapter.
>>>
>>> - if nvme controller is already in RESETTNG state and pci channel frozen
>>> error is detected then nvme driver pci-error-handler code sends the
>>> correct error code (PCI_ERS_RESULT_NEED_RESET) back to the EEH handler
>>> so that EEH handler could proceed with the pci slot reset.
>>
>> A subsystem reset takes the link down. I'm pretty sure the proper way to
>> recover from it requires pcie hotplug support.
>>
> Yes you're correct. We require pcie hotplugging to recover. However powerpc EEH
> handler could able to recover the pcie adapter without physically removing and
> re-inserting the adapter or in another words, it could reset adapter without
> hotplug activity. In fact, powerpc EEH could isolate pcie slot and resets it
> (i.e. resetting the PCI device holding the PCI #RST line high for two seconds),
> followed by setting up the device config space (the base address registers
> (BAR's), latency timer, cache line size, interrupt line, and so on).
>
> You may find more information about EEH recovery here:
> https://www.kernel.org/doc/Documentation/powerpc/eeh-pci-error-recovery.txt
>
> Typically when pcie error is detected and the EEH is able to recover the device,
> the EEH handler code goes through below sequence (assuming driver is EEH aware):
>
> eeh_handle_normal_event()
> eeh_set_channel_state()-> set state to pci_channel_io_frozen
> eeh_report_error()
> nvme_error_detected() -> channel state "pci_channel_io_frozen"; returns PCI_ERS_RESULT_NEED_RESET
> eeh_slot_reset() -> recovery successful
> nvme_slot_reset() -> returns PCI_ERS_RESULT_RECOVERED
> eeh_set_channel_state()-> set state to pci_channel_io_normal
> nvme_error_resume()
>
> In case pcie erorr is detected and the EEH is unable to recover the device,
> the EEH handler code goes through the below sequence:
>
> eeh_handle_normal_event()
> eeh_set_channel_state()-> set state to pci_channel_io_frozen
> eeh_report_error()
> nvme_error_detected() -> channel state pci_channel_io_frozen; returns PCI_ERS_RESULT_NEED_RESET
> eeh_slot_reset() -> recovery failed
> eeh_report_failure()
> nvme_error_detected()-> channel state pci_channel_io_perm_failure; returns PCI_ERS_RESULT_DISCONNECT
> eeh_set_channel_state()-> set state to pci_channel_io_perm_failure
> nvme_remove()
>
>
> If we execute the command "nvme subsystem-reset ..." and adapter communication is
> lost then in the current code (under nvme_reset_work()) we simply disable the device
> and mark the controller DEAD. However we may have a chance to recover the controller
> if driver is EEH aware and EEH recovery is underway. We already handle one such case
> in nvme_timeout(). So this patch ensures that if we fall through nvme_reset_work()
> post subsystem-reset and the EEH recovery is in progress then we give a chance to the
> EEH mechanism to recover the adapter. If in case the EEH recovery is unsuccessful then
> we'd anyway fall through code path I mentioned above where we invoke nvme_remove() at
> the end and delete the erring controller.
>
BTW, the similar issue was earlier fixed in 651438bb0af5(nvme-pci: Fix EEH failure on ppc).
And that fix was needed due to the controller health check polling was removed in
b2a0eb1a0ac72869(nvme-pci: Remove watchdog timer). In fact, today we may be able to recover
the NVMe adapter if subsystem-reset or any other PCI error occurs and at the same time some
I/O request is in flight. The recovery is possible due to the in flight I/O request would
eventually timeout and the nvme_timeout() has special code (added in 651438bb0af5) that
gives EEH a chance to recover the adapter.
However later, in 1e866afd4bcdd(nvme: ensure subsystem reset is single threaded), the
nvme subsystem reset code was reworked so now when user executes command subsystem-reset,
kernel first writes 0x4E564D65 to nvme NSSR register and then schedules the adapter reset.
It's quite possible that when subsytem-reset is executed there were no I/O in flight and
hence we may never hit the nvme_timeout(). Later when the adapter reset code (under
nvme_reset_work()) start execution, it accesses MMIO registers. Hence, IMO, potentially
nvme_reset_work() would also need similar changes as implemented under nvme_timeout() so
that EEH recovery could be possible.
> With the proposed patch, we find that EEH recovery is successful post subsystem-reset.
> Please find below the relevant output:
> # lspci
> 0524:28:00.0 Non-Volatile memory controller: KIOXIA Corporation NVMe SSD Controller CM7 2.5" (rev 01)
>
> # nvme list-subsys
> nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3
> hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988
> iopolicy=numa
> \
> +- nvme0 pcie 0524:28:00.0 live
>
> # nvme subsystem-reset /dev/nvme0
>
> # nvme list-subsys
> nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3
> hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988
> iopolicy=numa
> \
> +- nvme0 pcie 0524:28:00.0 resetting
>
> [10556.034082] EEH: Recovering PHB#524-PE#280000
> [10556.034108] EEH: PE location: N/A, PHB location: N/A
> [10556.034112] EEH: Frozen PHB#524-PE#280000 detected
> [10556.034115] EEH: Call Trace:
> [10556.034117] EEH: [c000000000051068] __eeh_send_failure_event+0x7c/0x15c
> [10556.034304] EEH: [c000000000049bcc] eeh_dev_check_failure.part.0+0x27c/0x6b0
> [10556.034310] EEH: [c008000004753d3c] nvme_pci_reg_read32+0x80/0xac [nvme]
> [10556.034319] EEH: [c0080000045f365c] nvme_wait_ready+0xa4/0x18c [nvme_core]
> [10556.034333] EEH: [c008000004754750] nvme_dev_disable+0x370/0x41c [nvme]
> [10556.034338] EEH: [c008000004757184] nvme_reset_work+0x1f4/0x3cc [nvme]
> [10556.034344] EEH: [c00000000017bb8c] process_one_work+0x1f0/0x4f4
> [10556.034350] EEH: [c00000000017c24c] worker_thread+0x3bc/0x590
> [10556.034355] EEH: [c00000000018a87c] kthread+0x138/0x140
> [10556.034358] EEH: [c00000000000dd58] start_kernel_thread+0x14/0x18
> [10556.034363] EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
> [10556.034368] EEH: Notify device drivers to shutdown
> [10556.034371] EEH: Beginning: 'error_detected(IO frozen)'
> [10556.034376] PCI 0524:28:00.0#280000: EEH: Invoking nvme->error_detected(IO frozen)
> [10556.034379] nvme nvme0: frozen state error detected, reset controller
> [10556.102654] nvme 0524:28:00.0: enabling device (0000 -> 0002)
> [10556.103171] nvme nvme0: PCI recovery is ongoing so let it finish
> [10556.142532] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'need reset'
> [10556.142535] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
> [...]
> [...]
> [10556.148172] EEH: Reset without hotplug activity
> [10558.298672] EEH: Beginning: 'slot_reset'
> [10558.298692] PCI 0524:28:00.0#280000: EEH: Invoking nvme->slot_reset()
> [10558.298696] nvme nvme0: restart after slot reset
> [10558.301925] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'recovered'
> [10558.301928] EEH: Finished:'slot_reset' with aggregate recovery state:'recovered'
> [10558.301939] EEH: Notify device driver to resume
> [10558.301944] EEH: Beginning: 'resume'
> [10558.301947] PCI 0524:28:00.0#280000: EEH: Invoking nvme->resume()
> [10558.331051] nvme nvme0: Shutdown timeout set to 10 seconds
> [10558.356679] nvme nvme0: 16/0/0 default/read/poll queues
> [10558.357026] PCI 0524:28:00.0#280000: EEH: nvme driver reports: 'none'
> [10558.357028] EEH: Finished:'resume'
> [10558.357035] EEH: Recovery successful.
>
> # nvme list-subsys
> nvme-subsys0 - NQN=nqn.2019-10.com.kioxia:KCM7DRUG1T92:7DQ0A01206N3
> hostnqn=nqn.2014-08.org.nvmexpress:uuid:41528538-e8ad-4eaf-84a7-9c552917d988
> iopolicy=numa
> \
> +- nvme0 pcie 0524:28:00.0 live
>
>
Thanks,
--Nilay
More information about the Linux-nvme
mailing list