nvme: machine check when running nvme subsystem-reset /dev/nvme0 against direct attach via PCIE slot
Nilay Shroff
nilay at linux.ibm.com
Thu Sep 26 23:10:05 PDT 2024
On 9/27/24 02:41, Laurence Oberman wrote:
> Hi Keith
> Hope all is well
>
> Quick question (expected or not)
>
> It was reported to Red Hat, seeing issues with using a
> "nvme subsystem-reset /dev/nvme0" command to test resets.
>
> On multiple servers I tested on two types of nvme attached devices
> These are not the rootfs devices
>
> 1. The front slot (hotplug) devices in a 2.5in format
> reset and after some time recover (what is expected)
>
> Example of one working
>
> Does not trap and land up as a machine-check
>
> [ 2215.440468] pcieport 0000:10:01.1: AER: Multiple Uncorrected (Non-
> Fatal) error received: 0000:12:13.0
> [ 2215.440532] pcieport 0000:12:13.0: PCIe Bus Error:
> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester
> ID)
> [ 2215.440536] pcieport 0000:12:13.0: device [10b5:8748] error
> status/mask=00100000/00000000
> [ 2215.440540] pcieport 0000:12:13.0: [20] UnsupReq
> (First)
> [ 2215.440544] pcieport 0000:12:13.0: AER: TLP Header: 40009001
> 1000000f e9211000 12000000
> [ 2215.441813] systemd-journald[2173]: Sent WATCHDOG=1 notification.
> [ 2216.937498] {1}[Hardware Error]: Hardware error from APEI Generic
> Hardware Error Source: 4
> [ 2216.937505] {1}[Hardware Error]: event severity: info
> [ 2216.937508] {1}[Hardware Error]: Error 0, type: fatal
> [ 2216.937511] {1}[Hardware Error]: fru_text: PcieError
> [ 2216.937514] {1}[Hardware Error]: section_type: PCIe error
> [ 2216.937515] {1}[Hardware Error]: port_type: 4, root port
> [ 2216.937517] {1}[Hardware Error]: version: 0.2
> [ 2216.937519] {1}[Hardware Error]: command: 0x0407, status: 0x0010
> [ 2216.937522] {1}[Hardware Error]: device_id: 0000:10:01.1
> [ 2216.937524] {1}[Hardware Error]: slot: 3
> [ 2216.937525] {1}[Hardware Error]: secondary_bus: 0x11
> [ 2216.937526] {1}[Hardware Error]: vendor_id: 0x1022, device_id:
> 0x1453
> [ 2216.937528] {1}[Hardware Error]: class_code: 060400
> [ 2216.937529] {1}[Hardware Error]: bridge: secondary_status: 0x2000,
> control: 0x0012
> [ 2216.937530] {1}[Hardware Error]: aer_uncor_status: 0x00000000,
> aer_uncor_mask: 0x04500000
> [ 2216.937532] {1}[Hardware Error]: aer_uncor_severity: 0x004e2030
> [ 2216.937532] {1}[Hardware Error]: TLP Header: 00000000 00000000
> 00000000 00000000
> [ 2216.937629] pcieport 0000:10:01.1: AER: aer_status: 0x00000000,
> aer_mask: 0x04500000
> [ 2216.937634] pcieport 0000:10:01.1: AER: aer_layer=Transaction Layer,
> aer_agent=Receiver ID
> [ 2216.937638] pcieport 0000:10:01.1: AER: aer_uncor_severity:
> 0x004e2030
> [ 2216.937645] nvme nvme4: frozen state error detected, reset
> controller
> [ 2217.071095] nvme nvme10: frozen state error detected, reset
> controller
> [ 2217.096928] nvme nvme0: frozen state error detected, reset
> controller
> [ 2217.118947] nvme nvme18: frozen state error detected, reset
> controller
> [ 2217.138945] nvme nvme6: frozen state error detected, reset
> controller
> [ 2217.164918] nvme nvme14: frozen state error detected, reset
> controller
> [ 2217.186902] nvme nvme20: frozen state error detected, reset
> controller
> [ 2279.420266] nvme 0000:1a:00.0: Unable to change power state from
> D3cold to D0, device inaccessible
> [ 2279.420329] nvme nvme22: Disabling device after reset failure: -19
> [ 2279.464727] pcieport 0000:12:13.0: AER: device recovery failed
> [ 2279.464823] pcieport 0000:12:13.0: pciehp: pcie_do_write_cmd: no
> response from device
>
> Port resets and recovers
>
> [ 2279.593196] pcieport 0000:10:01.1: AER: Root Port link has been
> reset (0)
> [ 2279.593699] nvme nvme4: restart after slot reset
> [ 2279.593949] nvme nvme10: restart after slot reset
> [ 2279.594222] nvme nvme0: restart after slot reset
> [ 2279.594453] nvme nvme18: restart after slot reset
> [ 2279.594728] nvme nvme6: restart after slot reset
> [ 2279.594984] nvme nvme14: restart after slot reset
> [ 2279.595226] nvme nvme20: restart after slot reset
> [ 2279.595435] pcieport 0000:12:13.0: pciehp: Slot(19): Card present
> [ 2279.595441] pcieport 0000:12:13.0: pciehp: Slot(19): Link Up
> [ 2279.609081] nvme nvme4: Shutdown timeout set to 8 seconds
> [ 2279.617532] nvme nvme0: Shutdown timeout set to 8 seconds
> [ 2279.617533] nvme nvme14: Shutdown timeout set to 8 seconds
> [ 2279.618028] nvme nvme6: Shutdown timeout set to 8 seconds
> [ 2279.618207] nvme nvme18: Shutdown timeout set to 8 seconds
> [ 2279.618290] nvme nvme10: Shutdown timeout set to 8 seconds
> [ 2279.618308] nvme nvme20: Shutdown timeout set to 8 seconds
> [ 2279.631961] nvme nvme4: 32/0/0 default/read/poll queues
> [ 2279.643293] nvme nvme14: 32/0/0 default/read/poll queues
> [ 2279.643372] nvme nvme0: 32/0/0 default/read/poll queues
> [ 2279.644881] nvme nvme6: 32/0/0 default/read/poll queues
> [ 2279.644966] nvme nvme10: 32/0/0 default/read/poll queues
> [ 2279.645030] nvme nvme18: 32/0/0 default/read/poll queues
> [ 2279.645132] nvme nvme20: 32/0/0 default/read/poll queues
> [ 2279.645202] pcieport 0000:10:01.1: AER: device recovery successful
>
> 2. Any kernel upstream latest 6.11, RHEL8 or RHEL9 causes
> a machine check and panics the box when its against a nvme in a
> PCIE slot
>
> 263.862919] mce: [Hardware Error]: CPU 12: Machine Check Exception: 5
> Bank 6: ba00000000000e0b
> [ 263.862924] mce: [Hardware Error]: RIP !INEXACT!
> 10:<ffffffff8571dce4> {intel_idle+0x54/0x90}
> [ 263.862931] mce: [Hardware Error]: TSC 7a47d8d62ba6dd MISC 83100000
> [ 263.862933] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1727384194
> SOCKET 1 APIC 40 microcode d0003a5
> [ 263.862936] mce: [Hardware Error]: Run the above through 'mcelog --
> ascii'
> [ 263.885254] mce: [Hardware Error]: Machine check: Processor context
> corrupt
> [ 263.885259] Kernel panic - not syncing: Fatal machine check
>
> Hardware event. This is not a software error.
> CPU 0 BANK 0 TSC 7a47d8d62ba6dd
> RIP !INEXACT! 10:ffffffff8571dce4
> TIME 1727384194 Thu Sep 26 16:56:34 2024
> MCG status:
> MCi status:
> Machine check not valid
> Corrected error
> MCA: No Error
> STATUS 0 MCGSTATUS 0
> CPUID Vendor Intel Family 6 Model 106 Step 6
> RIP: intel_idle+0x54/0x90}
> SOCKET 1 APIC 40 microcode d0003a5
> Run the above through 'mcelog --ascii'
> Machine check: Processor context corrupt
>
> Regards
> Laurence
>
>
>
I think the Keith's email address is not correct. Adding the correct email address of Keith here.
BTW, Keith recently help fixed an issue in kernel v6.11 with nvme subsystem-reset command to ensure
that we recover the nvme disk on PPC. On PPC architecture, we use EEH to recover the disk post
subsystem-reset but yours is Intel and that uses AER for recovery. So I'm not sure if that same
commit 210b1f6576e8("nvme-pci: do not directly handle subsys reset fallout") which was merged in
kernel v6.11 causing a side effect on the Intel machine.
Would you please revert the above commit and see if that help fix the observed symptom on your
Intel machine?
Thanks,
--Nilay
More information about the Linux-nvme
mailing list