frozen PHB on IBM Power9 system in 6.15-rc2 (bisected)
Dan Horák
dan at danny.cz
Thu Apr 17 08:10:26 PDT 2025
Hi,
I am seeing "frozen PHB" on Power9 bare-metal (PowerNV ppc64le) system
leading to non-accessible nvme drives (they are behind a switch) in the
6.15-rc2 kernel (originally with kernel-6.15.0-0.rc2.22.fc43). I was
able to bisect the issue to commit
62baf70c327444338c34703c71aa8cc8e4189bd6 [1].
Please see [2] for full console log and other details. Please ignore
the "soft-lockup" messages, they are unrelated and going to be resolved
with [3]. We are building the kernel with CONFIG_NVME_MULTIPATH=y
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=62baf70c327444338c34703c71aa8cc8e4189bd6
[2] https://fedora.danny.cz/ppc/rdsosreport.txt
[3] https://lore.kernel.org/all/20250410125110.1232329-1-gshan@redhat.com/
from the console log
...
[ 145.996408] talos.danny.cz kernel: Adaptec aacraid driver 1.2.1[50983]-custom
[ 145.996732] talos.danny.cz kernel: aacraid 0002:01:00.0: enabling device (0140 -> 0142)
[ 146.010113] talos.danny.cz kernel: nvme nvme0: pci function 0030:0d:00.0
[ 146.010160] talos.danny.cz kernel: nvme 0030:0d:00.0: enabling device (0140 -> 0142)
[ 146.010517] talos.danny.cz kernel: nvme nvme1: pci function 0030:0e:00.0
[ 146.010551] talos.danny.cz kernel: nvme 0030:0e:00.0: enabling device (0140 -> 0142)
[ 146.017051] talos.danny.cz kernel: nvme nvme0: D3 entry latency set to 8 seconds
[ 146.017199] talos.danny.cz kernel: nvme nvme1: D3 entry latency set to 8 seconds
[ 146.034341] talos.danny.cz kernel: aacraid: Comm Interface type3 enabled
[ 146.041633] talos.danny.cz kernel: usb 1-2.2: New USB device found, idVendor=046d, idProduct=c077, bcdDevice=72.00
[ 146.041659] talos.danny.cz kernel: usb 1-2.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 146.041682] talos.danny.cz kernel: usb 1-2.2: Product: USB Optical Mouse
[ 146.041693] talos.danny.cz kernel: usb 1-2.2: Manufacturer: Logitech
[ 146.046181] talos.danny.cz kernel: nvme nvme1: 32/0/0 default/read/poll queues
[ 146.048195] talos.danny.cz kernel: nvme nvme0: 32/0/0 default/read/poll queues
[ 146.051730] talos.danny.cz kernel: nvme1n1: p1
[ 146.053390] talos.danny.cz kernel: nvme0n1: p1
[ 146.053905] talos.danny.cz systemd[1]: Starting modprobe at configfs.service - Load Kernel Module configfs...
[ 146.054112] talos.danny.cz kernel: input: Logitech USB Optical Mouse as /devices/pci0003:00/0003:00:00.0/0003:01:00.0/usb1/1-2/1-2.2/1-2.2:1.0/0003:046D:C077.0002/input/input1
[ 146.054201] talos.danny.cz kernel: AAC0: kernel 3.2-0[0] Apr 24 2017
[ 146.054239] talos.danny.cz kernel: AAC0: monitor 0.0-0[0]
[ 146.054261] talos.danny.cz kernel: AAC0: bios 0.13-209[32000]
[ 146.054285] talos.danny.cz kernel: AAC0: serial 10F447
[ 146.054307] talos.danny.cz kernel: AAC0: Non-DASD support enabled.
[ 146.054331] talos.danny.cz kernel: aacraid 0002:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
[ 146.054361] talos.danny.cz kernel: AAC0: 64bit support enabled.
[ 146.054385] talos.danny.cz kernel: aacraid 0002:01:00.0: 64 Bit DAC enabled
[ 146.054587] talos.danny.cz kernel: hid-generic 0003:046D:C077.0002: input,hidraw1: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-0003:01:00.0-2.2/input0
[ 146.054701] talos.danny.cz systemd[1]: Finished systemd-udev-trigger.service - Coldplug All udev Devices.
[ 146.057858] talos.danny.cz kernel: scsi host8: aacraid
[ 146.059886] talos.danny.cz systemd[1]: modprobe at configfs.service: Deactivated successfully.
[ 146.060225] talos.danny.cz systemd[1]: Finished modprobe at configfs.service - Load Kernel Module configfs.
[ 146.064530] talos.danny.cz kernel: scsi 8:2:0:0: Direct-Access ATA WDC WD5000AAKX-0 1H19 PQ: 0 ANSI: 6
[ 146.083800] talos.danny.cz kernel: sd 8:2:0:0: Attached scsi generic sg3 type 0
[ 146.084962] talos.danny.cz kernel: sd 8:2:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[ 146.086117] talos.danny.cz kernel: scsi 8:2:1:0: Direct-Access ATA WDC WD5000AAKS-0 1D05 PQ: 0 ANSI: 6
[ 146.086715] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Write Protect is off
[ 146.086735] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Mode Sense: 46 00 10 08
[ 146.088650] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 146.109852] talos.danny.cz kernel: sd 8:2:1:0: Attached scsi generic sg4 type 0
[ 146.110789] talos.danny.cz kernel: scsi 8:3:123:0: Enclosure ADAPTEC Smart Adapter 3.02 PQ: 0 ANSI: 5
[ 146.111055] talos.danny.cz kernel: sd 8:2:1:0: [sdc] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[ 146.111521] talos.danny.cz kernel: scsi 8:3:123:0: Attached scsi generic sg5 type 13
[ 146.112133] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Write Protect is off
[ 146.112151] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Mode Sense: 46 00 10 08
[ 146.114638] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 146.127940] talos.danny.cz systemd[1]: Mounting sys-kernel-config.mount - Kernel Configuration File System...
[ 146.128927] talos.danny.cz systemd[1]: Starting dracut-initqueue.service - dracut initqueue hook...
[ 146.129025] talos.danny.cz systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
[ 146.129101] talos.danny.cz systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup.
[ 146.129272] talos.danny.cz systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup...
[ 146.130235] talos.danny.cz systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
[ 146.132522] talos.danny.cz systemd[1]: Mounted sys-kernel-config.mount - Kernel Configuration File System.
[ 146.166186] talos.danny.cz systemd[1]: Reloading requested from client PID 1225 ('systemctl') (unit dracut-initqueue.service)...
[ 146.166230] talos.danny.cz systemd[1]: Reloading...
[ 146.253661] talos.danny.cz kernel: usb 1-2.1.3: new low-speed USB device number 7 using xhci_hcd
[ 146.289125] talos.danny.cz kernel: sdb: sdb1 sdb2 sdb3 sdb4
[ 146.289537] talos.danny.cz kernel: sd 8:2:0:0: [sdb] Attached SCSI disk
[ 146.380101] talos.danny.cz systemd[1]: Reloading finished in 213 ms.
[ 147.119656] talos.danny.cz kernel: usb 1-2.1.3: New USB device found, idVendor=0463, idProduct=ffff, bcdDevice= 0.01
[ 147.119687] talos.danny.cz kernel: usb 1-2.1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 147.119718] talos.danny.cz kernel: usb 1-2.1.3: Product: 5S
[ 147.119730] talos.danny.cz kernel: usb 1-2.1.3: Manufacturer: EATON
[ 147.945736] talos.danny.cz kernel: [drm] amdgpu kernel modesetting enabled.
[ 147.945864] talos.danny.cz kernel: amdgpu: DSDT table not found for OEM information
[ 147.945892] talos.danny.cz kernel: amdgpu: IO link not available for non x86 platforms
[ 147.945906] talos.danny.cz kernel: amdgpu: IO link not available for non x86 platforms
[ 147.945932] talos.danny.cz kernel: amdgpu: Virtual CRAT table created for CPU
[ 147.945979] talos.danny.cz kernel: amdgpu: Topology: Add CPU node
[ 147.946287] talos.danny.cz kernel: amdgpu 0000:01:00.0: enabling device (0540 -> 0542)
[ 147.946319] talos.danny.cz kernel: [drm] initializing kernel modesetting (POLARIS11 0x1002:0x67E3 0x1002:0x0B0D 0x00).
[ 147.946359] talos.danny.cz kernel: [drm] register mmio base: 0x00000000
[ 147.946383] talos.danny.cz kernel: [drm] register mmio size: 262144
[ 147.946579] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 0 <vi_common>
[ 147.946599] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 1 <gmc_v8_0>
[ 147.946618] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 2 <tonga_ih>
[ 147.946637] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 3 <gfx_v8_0>
[ 147.946676] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 4 <sdma_v3_0>
[ 147.946705] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 5 <powerplay>
[ 147.946724] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 6 <dm>
[ 147.946740] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 7 <uvd_v6_0>
[ 147.946759] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: detected ip block number 8 <vce_v3_0>
[ 147.966292] talos.danny.cz kernel: sdc: sdc1 sdc2 sdc3 sdc4
[ 147.966699] talos.danny.cz kernel: sd 8:2:1:0: [sdc] Attached SCSI disk
[ 148.294421] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 148.294444] talos.danny.cz kernel: amdgpu: ATOM BIOS: 113-D0150600-103
[ 148.295816] talos.danny.cz kernel: [drm] UVD is enabled in VM mode
[ 148.295831] talos.danny.cz kernel: [drm] UVD ENC is enabled in VM mode
[ 148.295842] talos.danny.cz kernel: [drm] VCE enabled in VM mode
[ 148.295852] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 148.295878] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
[ 148.295893] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: PCI CONFIG reset
[ 148.296033] talos.danny.cz kernel: [drm] GPU posting now...
[ 148.680650] talos.danny.cz systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
[ 148.767932] talos.danny.cz kernel: [drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 148.769652] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2 [mem 0x6000010000000-0x60000101fffff 64bit pref]: releasing
[ 148.769677] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x600000fffffff 64bit pref]: releasing
[ 148.769709] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]: releasing
[ 148.769730] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x600017fffffff 64bit pref]: assigned
[ 148.769747] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x60000ffffffff 64bit pref]: assigned
[ 148.769779] talos.danny.cz kernel: amdgpu 0000:01:00.0: BAR 2 [mem 0x6000100000000-0x60001001fffff 64bit pref]: assigned
[ 148.769802] talos.danny.cz kernel: pci 0000:00:00.0: PCI bridge to [bus 01]
[ 148.769815] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff]
[ 148.769829] talos.danny.cz kernel: pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ 148.769849] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 148.769865] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 148.769882] talos.danny.cz kernel: [drm] Detected VRAM RAM=4096M, BAR=4096M
[ 148.769892] talos.danny.cz kernel: [drm] RAM width 128bits GDDR5
[ 148.769901] talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
[ 148.770903] talos.danny.cz kernel: hid-generic 0003:0463:FFFF.0003: hiddev96,hidraw2: USB HID v1.10 Device [EATON 5S] on usb-0003:01:00.0-2.1.3/input0
[ 148.771257] talos.danny.cz kernel: [drm] amdgpu: 4096M of VRAM memory ready
[ 148.771279] talos.danny.cz kernel: [drm] amdgpu: 32600M of GTT memory ready.
[ 148.771352] talos.danny.cz kernel: [drm] GART: num cpu pages 4096, num gpu pages 65536
[ 148.772018] talos.danny.cz kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
[ 148.774684] talos.danny.cz kernel: [drm] Chained IB support enabled!
[ 148.787973] talos.danny.cz kernel: amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[ 148.813569] talos.danny.cz kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
[ 148.826195] talos.danny.cz kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[ 148.842658] talos.danny.cz kernel: [drm] Display Core v3.2.325 initialized on DCE 11.2
[ 148.843782] talos.danny.cz systemd[1]: Starting plymouth-start.service - Show Plymouth Boot Screen...
[ 148.846228] talos.danny.cz systemd[1]: Received SIGRTMIN+20 from PID 1344 (plymouthd).
[ 148.998163] talos.danny.cz kernel: usb 1-2.1.4: new full-speed USB device number 8 using xhci_hcd
[ 148.998776] talos.danny.cz kernel: [drm] UVD and UVD ENC initialized successfully.
[ 149.099708] talos.danny.cz kernel: [drm] VCE initialized successfully.
[ 149.107051] talos.danny.cz kernel: kfd kfd: amdgpu: skipped device 1002:67e3, PCI rejects atomics 730<0
[ 149.107085] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16
[ 149.110945] talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[ 149.111687] talos.danny.cz kernel: [drm] Initialized amdgpu 3.63.0 for 0000:01:00.0 on minor 0
[ 149.129106] talos.danny.cz systemd[1]: systemd-vconsole-setup.service: Deactivated successfully.
[ 149.129315] talos.danny.cz systemd[1]: Stopped systemd-vconsole-setup.service - Virtual Console Setup.
[ 149.146843] talos.danny.cz kernel: Console: switching to colour frame buffer device 240x75
[ 149.160686] talos.danny.cz kernel: md/raid1:md127: active with 2 out of 2 mirrors
[ 149.172981] talos.danny.cz kernel: md127: detected capacity change from 0 to 940836864
[ 149.173957] talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 149.174099] talos.danny.cz systemd[1]: Stopping systemd-vconsole-setup.service - Virtual Console Setup...
[ 149.278744] talos.danny.cz kernel: md/raid1:md126: active with 2 out of 2 mirrors
[ 149.292981] talos.danny.cz kernel: md126: detected capacity change from 0 to 2095104
[ 149.331567] talos.danny.cz kernel: usb 1-2.1.4: New USB device found, idVendor=0403, idProduct=6015, bcdDevice=10.00
[ 149.331574] talos.danny.cz kernel: usb 1-2.1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 149.331579] talos.danny.cz kernel: usb 1-2.1.4: Product: FT230X Basic UART
[ 149.331582] talos.danny.cz kernel: usb 1-2.1.4: Manufacturer: FTDI
[ 149.331584] talos.danny.cz kernel: usb 1-2.1.4: SerialNumber: DM02XLCC
[ 149.353523] talos.danny.cz systemd-journald[629]: Journal started
[ 149.353582] talos.danny.cz systemd-journald[629]: Runtime Journal (/run/log/journal/d94ac98ea91043d3892dab218d99209d) is 8.0M, max 1.2G, 1.2G free.
[ 149.360328] talos.danny.cz systemd-vconsole-setup[632]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring.
[ 149.360817] talos.danny.cz systemd-modules-load[631]: Inserted module 'fuse'
[ 149.361170] talos.danny.cz systemd-modules-load[631]: Inserted module 'i2c_dev'
[ 149.361370] talos.danny.cz systemd-modules-load[631]: Inserted module 'ip_tables'
[ 149.361419] talos.danny.cz systemd-vconsole-setup[640]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version
[ 149.361697] talos.danny.cz systemd-modules-load[631]: Inserted module 'ip6_tables'
[ 149.361983] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_alua'
[ 149.362176] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_emc'
[ 149.362453] talos.danny.cz systemd-modules-load[631]: Failed to find module 'scsi_dh_rdac'
[ 149.362539] talos.danny.cz dracut-cmdline[659]: dracut-102-2.fc40
[ 149.362539] talos.danny.cz dracut-cmdline[659]: Using kernel command line parameters: root=/dev/mapper/Linux-Root ro rd.lvm.lv=Linux/Root rd.md.uuid=60936c65:08d9f6bc:b191c895:332a4d53 rd.md.uuid=06128381:0df3ab4b:02ebd84d:84921066 rd.md.uuid=3c52d341:6485ed32:9da81f4c:706b231f console=tty1 console=hvc0
[ 149.362878] talos.danny.cz systemd-sysusers[650]: Creating group 'nobody' with GID 65534.
[ 149.363123] talos.danny.cz systemd-sysusers[650]: Creating group 'users' with GID 100.
[ 149.363316] talos.danny.cz systemd-sysusers[650]: Creating group 'systemd-journal' with GID 190.
[ 149.363503] talos.danny.cz systemd-vconsole-setup[632]: Setting source virtual console failed, ignoring remaining ones.
[ 149.363694] talos.danny.cz systemd-udevd[753]: Using default interface naming scheme 'v255'.
[ 149.366236] talos.danny.cz systemd-vconsole-setup[1224]: setfont: ERROR kdfontop.c:183 put_font_kdfontop: Unable to load such font with such kernel version
[ 149.414053] talos.danny.cz systemd[1]: Starting systemd-vconsole-setup.service - Virtual Console Setup...
[ 149.415317] talos.danny.cz systemd[1]: Started systemd-journald.service - Journal Service.
[ 149.384617] talos.danny.cz systemd-vconsole-setup[1199]: /usr/bin/setfont failed with a "system error" (EX_OSERR), ignoring.
[ 149.384730] talos.danny.cz systemd-vconsole-setup[1199]: Setting source virtual console failed, ignoring remaining ones.
[ 149.419101] talos.danny.cz systemd[1]: Starting systemd-tmpfiles-setup.service - Create System Files and Directories...
[ 149.449593] talos.danny.cz systemd-tmpfiles[1510]: /usr/lib/tmpfiles.d/var.conf:14: Duplicate line for path "/var/log", ignoring.
[ 149.464643] talos.danny.cz systemd[1]: Finished systemd-tmpfiles-setup.service - Create System Files and Directories.
[ 149.774957] talos.danny.cz systemd[1]: Started plymouth-start.service - Show Plymouth Boot Screen.
[ 149.775823] talos.danny.cz systemd[1]: systemd-ask-password-console.path - Dispatch Password Requests to Console Directory Watch was skipped because of an unmet condition check (ConditionPathExists=!/run/plymouth/pid).
[ 149.776071] talos.danny.cz systemd[1]: Started systemd-ask-password-plymouth.path - Forward Password Requests to Plymouth Directory Watch.
[ 149.776506] talos.danny.cz systemd[1]: Reached target paths.target - Path Units.
[ 149.860914] talos.danny.cz systemd[1]: Finished systemd-vconsole-setup.service - Virtual Console Setup.
[ 149.861277] talos.danny.cz systemd[1]: Reached target sysinit.target - System Initialization.
[ 149.862051] talos.danny.cz systemd[1]: Reached target basic.target - Basic System.
[ 149.950478] talos.danny.cz systemd[1]: Started rngd.service - Hardware RNG Entropy Gatherer Daemon.
[ 149.957295] talos.danny.cz rngd[1525]: Disabling 7: PKCS11 Entropy generator (pkcs11)
[ 149.957295] talos.danny.cz rngd[1525]: Disabling 5: NIST Network Entropy Beacon (nist)
[ 149.957295] talos.danny.cz rngd[1525]: Initializing available sources
[ 149.957880] talos.danny.cz rngd[1525]: [hwrng ]: Initialization Failed
[ 149.960647] talos.danny.cz rngd[1525]: [darn ]: Enabling power DARN rng support
[ 149.960647] talos.danny.cz rngd[1525]: [darn ]: Initialized
[ 149.960647] talos.danny.cz rngd[1525]: [jitter]: JITTER timeout set to 5 sec
[ 150.069593] talos.danny.cz rngd[1525]: [jitter]: Initializing AES buffer
[ 155.000031] talos.danny.cz rngd[1525]: [jitter]: Unable to obtain AES key, disabling JITTER source
[ 155.000886] talos.danny.cz rngd[1525]: [jitter]: Initialization Failed
[ 155.008927] talos.danny.cz rngd[1525]: [rtlsdr]: Initialization Failed
[ 155.008927] talos.danny.cz rngd[1525]: [namedpipe]: Initialization Failed
[ 159.883963] talos.danny.cz kernel: pci 0032:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.884630] talos.danny.cz kernel: pci 0033:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.885271] talos.danny.cz kernel: pci 0000:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.885892] talos.danny.cz kernel: pci 0001:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.886511] talos.danny.cz kernel: pci 0002:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.887129] talos.danny.cz kernel: pci 0003:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.887742] talos.danny.cz kernel: pci 0004:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.888355] talos.danny.cz kernel: pci 0005:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.888958] talos.danny.cz kernel: pci 0005:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.889557] talos.danny.cz kernel: pci 0030:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.890176] talos.danny.cz kernel: pci 0030:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.890762] talos.danny.cz kernel: pci 0030:02:04.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.891344] talos.danny.cz kernel: pci 0030:02:05.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.891946] talos.danny.cz kernel: pci 0030:02:06.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.892538] talos.danny.cz kernel: pci 0030:02:07.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 159.893121] talos.danny.cz kernel: pci 0031:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 176.682101] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fb
[ 176.682114] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A
[ 176.682116] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fb detected
[ 176.682118] talos.danny.cz kernel: EEH: Call Trace:
[ 176.682119] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150
[ 176.682129] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0
[ 176.682136] talos.danny.cz kernel: EEH: [000000000896f909] nvme_timeout+0x264/0x670 [nvme]
[ 176.682145] talos.danny.cz kernel: EEH: [000000002faff0a8] blk_mq_handle_expired+0xb0/0x130
[ 176.682151] talos.danny.cz kernel: EEH: [00000000014f13c7] bt_iter+0xf8/0x140
[ 176.682156] talos.danny.cz kernel: EEH: [0000000072f9f2ba] blk_mq_queue_tag_busy_iter+0x384/0x680
[ 176.682160] talos.danny.cz kernel: EEH: [00000000449486be] blk_mq_timeout_work+0x198/0x1f0
[ 176.682164] talos.danny.cz kernel: EEH: [00000000a8845314] process_one_work+0x1f8/0x510
[ 176.682170] talos.danny.cz kernel: EEH: [00000000a19f763f] worker_thread+0x33c/0x510
[ 176.682173] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160
[ 176.682178] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18
[ 176.682182] talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[ 176.682185] talos.danny.cz kernel: EEH: Notify device drivers to shutdown
[ 176.682187] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)'
[ 176.682191] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(IO frozen)
[ 176.682198] talos.danny.cz kernel: nvme nvme0: frozen state error detected, reset controller
[ 176.934877] talos.danny.cz kernel: nvme0n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 176.934893] talos.danny.cz kernel: I/O error, dev nvme0n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 176.954198] talos.danny.cz kernel: nvme nvme0: Failed to get ANA log: -4
[ 176.973671] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'need reset'
[ 176.973677] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 176.973683] talos.danny.cz kernel: EEH: Collect temporary log
[ 176.973712] talos.danny.cz kernel: EEH: of node=0030:0d:00.0
[ 176.973716] talos.danny.cz kernel: EEH: PCI device/vendor: a808144d
[ 176.973719] talos.danny.cz kernel: EEH: PCI cmd/status register: 00100142
[ 176.973721] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
[ 176.973732] talos.danny.cz kernel: EEH: PCI-E 00: 0002b010 10648fc1 00002830 00437043
[ 176.973742] talos.danny.cz kernel: EEH: PCI-E 10: 10430000 00000000 00000000 00000000
[ 176.977408] talos.danny.cz kernel: EEH: PCI-E 20: 00000000
[ 176.977409] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
[ 176.977421] talos.danny.cz kernel: EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
[ 176.978606] talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 0000e000 000003e0 00000000
[ 176.978615] talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 176.978618] talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000
[ 176.979843] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1)
[ 176.979845] talos.danny.cz kernel: brdgCtl: 00000002
[ 176.979847] talos.danny.cz kernel: RootSts: 00060020 00402000 a0830008 00100107 00000800
[ 176.979849] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000
[ 176.981515] talos.danny.cz kernel: Lem: 0000000100000080 0000000000000000 0000000000000080
[ 176.981517] talos.danny.cz kernel: PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000
[ 176.981520] talos.danny.cz kernel: RxeTceErr: 2000000000000000 2000000000000000 c0000000000001fa 0000000000000000
[ 176.981522] talos.danny.cz kernel: PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000
[ 176.981524] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 8800000c00000000 0000000007011000
[ 176.981529] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000
[ 176.981532] talos.danny.cz kernel: PE[..1fb] A/B: as above
[ 176.981533] talos.danny.cz kernel: EEH: Reset without hotplug activity
[ 180.403658] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 1023ms after bus reset; waiting
[ 181.483654] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 2047ms after bus reset; waiting
[ 183.563680] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 4095ms after bus reset; waiting
[ 187.723654] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 8191ms after bus reset; waiting
[ 196.364131] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 16383ms after bus reset; waiting
[ 213.004161] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 32767ms after bus reset; waiting
[ 246.283652] talos.danny.cz kernel: nvme 0030:0d:00.0: not ready 65535ms after bus reset; giving up
[ 246.235672] talos.danny.cz systemd-udevd[753]: nvme0n1: Worker [958] processing SEQNUM=1898 is taking a long time
[ 246.236577] talos.danny.cz systemd-udevd[753]: nvme1n1: Worker [921] processing SEQNUM=1893 is taking a long time
[ 246.552477] talos.danny.cz kernel: EEH: Beginning: 'slot_reset'
[ 246.553820] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->slot_reset()
[ 246.553838] talos.danny.cz kernel: nvme nvme0: restart after slot reset
[ 246.629560] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'recovered'
[ 246.629565] talos.danny.cz kernel: EEH: Finished:'slot_reset' with aggregate recovery state:'recovered'
[ 246.630292] talos.danny.cz kernel: EEH: Notify device driver to resume
[ 246.630659] talos.danny.cz kernel: EEH: Beginning: 'resume'
[ 246.633097] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->resume()
[ 246.683699] talos.danny.cz kernel: nvme 0030:0d:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 246.684963] talos.danny.cz kernel: nvme nvme0: Disabling device after reset failure: -19
[ 246.744271] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211552, async page read
[ 246.745010] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384848, nr_sectors = 16 limit=0
[ 246.746281] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211553, async page read
[ 246.746929] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384864, nr_sectors = 16 limit=0
[ 246.748195] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211554, async page read
[ 246.748838] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384880, nr_sectors = 16 limit=0
[ 246.750125] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211555, async page read
[ 246.750778] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384896, nr_sectors = 16 limit=0
[ 246.752076] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211556, async page read
[ 246.752725] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384912, nr_sectors = 16 limit=0
[ 246.754025] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211557, async page read
[ 246.754792] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'none'
[ 246.754800] talos.danny.cz kernel: EEH: Finished:'resume'
[ 246.754804] talos.danny.cz kernel: EEH: Recovery successful.
[ 246.754809] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fb
[ 246.754820] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A
[ 246.754823] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fb detected
[ 246.754827] talos.danny.cz kernel: EEH: Call Trace:
[ 246.754829] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150
[ 246.754842] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0
[ 246.754853] talos.danny.cz kernel: EEH: [000000007534829c] pnv_pci_read_config+0x148/0x180
[ 246.754859] talos.danny.cz kernel: EEH: [000000006db2aa5c] pci_bus_read_config_dword+0x90/0xf0
[ 246.754866] talos.danny.cz kernel: EEH: [00000000ed12774a] pci_find_next_ext_capability+0x5c/0x150
[ 246.754874] talos.danny.cz kernel: EEH: [000000003a1ec347] pci_restore_ltr_state+0x40/0xa0
[ 246.754882] talos.danny.cz kernel: EEH: [000000000c3c04be] pci_restore_state.part.0+0x2c/0x3b0
[ 246.754888] talos.danny.cz kernel: EEH: [000000005a69e2f4] nvme_slot_reset+0x48/0x90 [nvme]
[ 246.754899] talos.danny.cz kernel: EEH: [00000000efdedb77] eeh_report_reset+0xd0/0x100
[ 246.754905] talos.danny.cz kernel: EEH: [00000000bdb52d8d] eeh_pe_report+0x2bc/0x548
[ 246.754911] talos.danny.cz kernel: EEH: [0000000039694420] eeh_handle_normal_event+0x89c/0x9c0
[ 246.754918] talos.danny.cz kernel: EEH: [00000000bbf75c7c] eeh_event_handler+0xfc/0x170
[ 246.754924] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160
[ 246.754932] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18
[ 246.754939] talos.danny.cz kernel: EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 246.754942] talos.danny.cz kernel: EEH: Notify device drivers to shutdown
[ 246.754945] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)'
[ 246.754949] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(IO frozen)
[ 246.754956] talos.danny.cz kernel: nvme nvme0: frozen state error detected, reset controller
[ 246.756867] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384928, nr_sectors = 16 limit=0
[ 246.775171] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211558, async page read
[ 246.775177] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme0n1: rw=0, sector=1875384944, nr_sectors = 16 limit=0
[ 246.775181] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1, logical block 117211559, async page read
[ 246.778998] talos.danny.cz kernel: Buffer I/O error on dev nvme0n1p1, logical block 8191999, async page read
[ 246.853703] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'disconnect'
[ 246.853709] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'disconnect'
[ 246.855030] talos.danny.cz kernel: EEH: Unable to recover from failure from PHB#30-PE#1fb.
Please try reseating or replacing it
[ 246.856319] talos.danny.cz kernel: EEH: of node=0030:0d:00.0
[ 246.856938] talos.danny.cz kernel: EEH: PCI device/vendor: ffffffff
[ 246.857564] talos.danny.cz kernel: EEH: PCI cmd/status register: ffffffff
[ 246.858192] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
[ 246.858829] talos.danny.cz kernel: EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff
[ 246.859465] talos.danny.cz kernel: EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff
[ 246.860081] talos.danny.cz kernel: EEH: PCI-E 20: ffffffff
[ 246.860727] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
[ 246.861354] talos.danny.cz kernel: EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff
[ 246.862027] talos.danny.cz kernel: EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff
[ 246.862849] talos.danny.cz kernel: EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff
[ 246.863450] talos.danny.cz kernel: EEH: PCI-E AER 30: ffffffff ffffffff
[ 246.864068] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1)
[ 246.864682] talos.danny.cz kernel: brdgCtl: 00000002
[ 246.864685] talos.danny.cz kernel: RootSts: 00000020 00402000 a0830008 00100107 00002000
[ 246.864689] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000
[ 246.864692] talos.danny.cz kernel: Lem: 0000000100280000 0000000000000000 0000000100000000
[ 246.864696] talos.danny.cz kernel: PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000
[ 246.864700] talos.danny.cz kernel: RxeArbErr: 4000200000000000 0000200000000000 02409fde30000000 0000000000000000
[ 246.864703] talos.danny.cz kernel: PblErr: 0000000001000000 0000000001000000 0000000000000000 0000000000000000
[ 246.864707] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 61000c4800000000 0000000000000000
[ 246.864714] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000
[ 246.870066] talos.danny.cz kernel: PE[1fb] A/B: b740002a02300000 8000000000000000
[ 246.870071] talos.danny.cz kernel: EEH: Beginning: 'error_detected(permanent failure)'
[ 246.873646] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: Invoking nvme->error_detected(permanent failure)
[ 246.873656] talos.danny.cz kernel: nvme nvme0: failure state error detected, request disconnect
[ 246.873701] talos.danny.cz kernel: PCI 0030:0d:00.0#01fb: EEH: nvme driver reports: 'disconnect'
[ 246.873708] talos.danny.cz kernel: EEH: Finished:'error_detected(permanent failure)'
[ 247.204789] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Releasing PE
[ 247.205905] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Removing DMA window #0
[ 247.206466] talos.danny.cz kernel: pci 0030:0d : [PE# 1fb] Disabling 64-bit DMA bypass
[ 247.211359] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fa
[ 247.211909] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A
[ 247.212456] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fa detected
[ 247.213000] talos.danny.cz kernel: EEH: Call Trace:
[ 247.213792] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150
[ 247.214363] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0
[ 247.214373] talos.danny.cz kernel: EEH: [000000000896f909] nvme_timeout+0x264/0x670 [nvme]
[ 247.214382] talos.danny.cz kernel: EEH: [000000002faff0a8] blk_mq_handle_expired+0xb0/0x130
[ 247.214389] talos.danny.cz kernel: EEH: [00000000014f13c7] bt_iter+0xf8/0x140
[ 247.216598] talos.danny.cz kernel: EEH: [0000000072f9f2ba] blk_mq_queue_tag_busy_iter+0x384/0x680
[ 247.216605] talos.danny.cz kernel: EEH: [00000000449486be] blk_mq_timeout_work+0x198/0x1f0
[ 247.216610] talos.danny.cz kernel: EEH: [00000000a8845314] process_one_work+0x1f8/0x510
[ 247.218312] talos.danny.cz kernel: EEH: [00000000a19f763f] worker_thread+0x33c/0x510
[ 247.218318] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160
[ 247.218325] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18
[ 247.219962] talos.danny.cz kernel: EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
[ 247.219965] talos.danny.cz kernel: EEH: Notify device drivers to shutdown
[ 247.219968] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)'
[ 247.223662] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen)
[ 247.224563] talos.danny.cz kernel: nvme nvme1: frozen state error detected, reset controller
[ 247.455003] talos.danny.cz kernel: nvme1n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 247.455611] talos.danny.cz kernel: I/O error, dev nvme1n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 247.484222] talos.danny.cz kernel: nvme nvme1: Failed to get ANA log: -4
[ 247.513681] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'need reset'
[ 247.513686] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 247.515019] talos.danny.cz kernel: EEH: Collect temporary log
[ 247.515659] talos.danny.cz kernel: EEH: of node=0030:0e:00.0
[ 247.516286] talos.danny.cz kernel: EEH: PCI device/vendor: a808144d
[ 247.516920] talos.danny.cz kernel: EEH: PCI cmd/status register: 00100142
[ 247.517567] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
[ 247.518224] talos.danny.cz kernel: EEH: PCI-E 00: 0002b010 10648fc1 00002830 00437043
[ 247.518883] talos.danny.cz kernel: EEH: PCI-E 10: 10430000 00000000 00000000 00000000
[ 247.519526] talos.danny.cz kernel: EEH: PCI-E 20: 00000000
[ 247.520166] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
[ 247.520810] talos.danny.cz kernel: EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
[ 247.521440] talos.danny.cz kernel: EEH: PCI-E AER 10: 00000000 0000e000 000003e0 00000000
[ 247.522068] talos.danny.cz kernel: EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 247.522692] talos.danny.cz kernel: EEH: PCI-E AER 30: 00000000 00000000
[ 247.523340] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1)
[ 247.523977] talos.danny.cz kernel: brdgCtl: 00000002
[ 247.524824] talos.danny.cz kernel: RootSts: 00060020 00402000 a0830008 00100107 00000800
[ 247.524828] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000
[ 247.524830] talos.danny.cz kernel: Lem: 0000000100000080 0000000000000000 0000000000000080
[ 247.524833] talos.danny.cz kernel: PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000
[ 247.524836] talos.danny.cz kernel: RxeTceErr: 2000000000000000 2000000000000000 c0000000000001fa 0000000000000000
[ 247.524838] talos.danny.cz kernel: PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000
[ 247.524841] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 8800000c00000000 0000000007011000
[ 247.524846] talos.danny.cz kernel: PE[1fa] A/B: 8300b03800000000 8000000000000000
[ 247.524849] talos.danny.cz kernel: PE[..1fb] A/B: as above
[ 247.524851] talos.danny.cz kernel: EEH: Reset without hotplug activity
[ 250.964186] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 1023ms after bus reset; waiting
[ 252.044182] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 2047ms after bus reset; waiting
[ 254.124189] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 4095ms after bus reset; waiting
[ 258.284197] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 8191ms after bus reset; waiting
[ 266.764192] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 16383ms after bus reset; waiting
[ 283.404208] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 32767ms after bus reset; waiting
[ 316.684204] talos.danny.cz kernel: nvme 0030:0e:00.0: not ready 65535ms after bus reset; giving up
[ 316.638265] talos.danny.cz dracut-initqueue[1340]: Timed out for waiting the udev queue being empty.
[ 316.952931] talos.danny.cz kernel: EEH: Beginning: 'slot_reset'
[ 316.953821] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->slot_reset()
[ 316.953841] talos.danny.cz kernel: nvme nvme1: restart after slot reset
[ 317.026731] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'recovered'
[ 317.026736] talos.danny.cz kernel: EEH: Finished:'slot_reset' with aggregate recovery state:'recovered'
[ 317.027508] talos.danny.cz kernel: EEH: Notify device driver to resume
[ 317.027891] talos.danny.cz kernel: EEH: Beginning: 'resume'
[ 317.030352] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->resume()
[ 317.083703] talos.danny.cz kernel: nvme 0030:0e:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 317.085001] talos.danny.cz kernel: nvme nvme1: Disabling device after reset failure: -19
[ 317.144264] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211552, async page read
[ 317.144465] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'none'
[ 317.144957] talos.danny.cz kernel: EEH: Finished:'resume'
[ 317.144965] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384848, nr_sectors = 16 limit=0
[ 317.144980] talos.danny.cz kernel: EEH: Recovery successful.
[ 317.144989] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211553, async page read
[ 317.145018] talos.danny.cz kernel: EEH: Recovering PHB#30-PE#1fa
[ 317.145025] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384864, nr_sectors = 16 limit=0
[ 317.145053] talos.danny.cz kernel: EEH: PE location: N/A, PHB location: N/A
[ 317.145056] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211554, async page read
[ 317.145102] talos.danny.cz kernel: EEH: Frozen PHB#30-PE#1fa detected
[ 317.145131] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384880, nr_sectors = 16 limit=0
[ 317.145171] talos.danny.cz kernel: EEH: Call Trace:
[ 317.145191] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211555, async page read
[ 317.145239] talos.danny.cz kernel: EEH: [0000000043837525] __eeh_send_failure_event+0x78/0x150
[ 317.145266] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384896, nr_sectors = 16 limit=0
[ 317.145297] talos.danny.cz kernel: EEH: [00000000ca9d8fff] eeh_dev_check_failure+0x2e0/0x6d0
[ 317.145318] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211556, async page read
[ 317.145322] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384912, nr_sectors = 16 limit=0
[ 317.145362] talos.danny.cz kernel: EEH: [000000007534829c] pnv_pci_read_config+0x148/0x180
[ 317.145392] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211557, async page read
[ 317.145415] talos.danny.cz kernel: EEH: [000000006db2aa5c] pci_bus_read_config_dword+0x90/0xf0
[ 317.145463] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384928, nr_sectors = 16 limit=0
[ 317.145482] talos.danny.cz kernel: EEH: [00000000ed12774a] pci_find_next_ext_capability+0x5c/0x150
[ 317.145514] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211558, async page read
[ 317.145544] talos.danny.cz kernel: EEH: [000000003a1ec347] pci_restore_ltr_state+0x40/0xa0
[ 317.145591] talos.danny.cz kernel: (udev-worker): attempt to access beyond end of device
nvme1n1: rw=0, sector=1875384944, nr_sectors = 16 limit=0
[ 317.145620] talos.danny.cz kernel: EEH: [000000000c3c04be] pci_restore_state.part.0+0x2c/0x3b0
[ 317.145652] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1, logical block 117211559, async page read
[ 317.145682] talos.danny.cz kernel: EEH: [000000005a69e2f4] nvme_slot_reset+0x48/0x90 [nvme]
[ 317.148788] talos.danny.cz kernel: Buffer I/O error on dev nvme1n1p1, logical block 8191999, async page read
[ 317.149262] talos.danny.cz kernel: EEH: [00000000efdedb77] eeh_report_reset+0xd0/0x100
[ 317.164254] talos.danny.cz kernel: EEH: [00000000bdb52d8d] eeh_pe_report+0x2bc/0x548
[ 317.164259] talos.danny.cz kernel: EEH: [0000000039694420] eeh_handle_normal_event+0x89c/0x9c0
[ 317.164262] talos.danny.cz kernel: EEH: [00000000bbf75c7c] eeh_event_handler+0xfc/0x170
[ 317.164264] talos.danny.cz kernel: EEH: [00000000786c4402] kthread+0x150/0x160
[ 317.164269] talos.danny.cz kernel: EEH: [00000000393b9885] start_kernel_thread+0x14/0x18
[ 317.164271] talos.danny.cz kernel: EEH: This PCI device has failed 2 times in the last hour and will be permanently disabled after 5 failures.
[ 317.164273] talos.danny.cz kernel: EEH: Notify device drivers to shutdown
[ 317.164276] talos.danny.cz kernel: EEH: Beginning: 'error_detected(IO frozen)'
[ 317.164278] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen)
[ 317.164281] talos.danny.cz kernel: nvme nvme1: frozen state error detected, reset controller
[ 317.153365] talos.danny.cz dracut-initqueue[1582]: Scanning devices md127 sda2 for LVM logical volumes Linux/Root
[ 317.172717] talos.danny.cz dracut-initqueue[1603]: WARNING: Couldn't find device with uuid MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj.
[ 317.172717] talos.danny.cz dracut-initqueue[1603]: WARNING: VG Linux is missing PV MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj (last written to /dev/md0).
[ 317.283754] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'disconnect'
[ 317.283760] talos.danny.cz kernel: EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'disconnect'
[ 317.285796] talos.danny.cz kernel: EEH: Unable to recover from failure from PHB#30-PE#1fa.
Please try reseating or replacing it
[ 317.287465] talos.danny.cz kernel: EEH: of node=0030:0e:00.0
[ 317.287983] talos.danny.cz kernel: EEH: PCI device/vendor: ffffffff
[ 317.288503] talos.danny.cz kernel: EEH: PCI cmd/status register: ffffffff
[ 317.289009] talos.danny.cz kernel: EEH: PCI-E capabilities and status follow:
[ 317.289538] talos.danny.cz kernel: EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff
[ 317.290053] talos.danny.cz kernel: EEH: PCI-E 10: ffffffff ffffffff ffffffff ffffffff
[ 317.290550] talos.danny.cz kernel: EEH: PCI-E 20: ffffffff
[ 317.291068] talos.danny.cz kernel: EEH: PCI-E AER capability register set follows:
[ 317.291574] talos.danny.cz kernel: EEH: PCI-E AER 00: ffffffff ffffffff ffffffff ffffffff
[ 317.292072] talos.danny.cz kernel: EEH: PCI-E AER 10: ffffffff ffffffff ffffffff ffffffff
[ 317.292554] talos.danny.cz kernel: EEH: PCI-E AER 20: ffffffff ffffffff ffffffff ffffffff
[ 317.293038] talos.danny.cz kernel: EEH: PCI-E AER 30: ffffffff ffffffff
[ 317.293501] talos.danny.cz kernel: PHB4 PHB#48 Diag-data (Version: 1)
[ 317.293959] talos.danny.cz kernel: brdgCtl: 00000002
[ 317.294510] talos.danny.cz kernel: RootSts: 00000020 00402000 a0830008 00100107 00002000
[ 317.294512] talos.danny.cz kernel: PhbSts: 0000001c00000000 0000001c00000000
[ 317.294514] talos.danny.cz kernel: Lem: 0000000100280000 0000000000000000 0000000100000000
[ 317.294516] talos.danny.cz kernel: PhbErr: 0000088000000000 0000008000000000 2148000098000240 a008400000000000
[ 317.294518] talos.danny.cz kernel: RxeArbErr: 4000200000000000 0000200000000000 02409fde30000000 0000000000000000
[ 317.294519] talos.danny.cz kernel: PblErr: 0000000001000000 0000000001000000 0000000000000000 0000000000000000
[ 317.294521] talos.danny.cz kernel: RegbErr: 0000004000000000 0000004000000000 61000c4800000000 0000000000000000
[ 317.294525] talos.danny.cz kernel: PE[1fa] A/B: b740002a02380000 8000000000000000
[ 317.294526] talos.danny.cz kernel: PE[1fb] A/B: af40000c00000000 800000000e000010
[ 317.294529] talos.danny.cz kernel: EEH: Beginning: 'error_detected(permanent failure)'
[ 317.297129] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(permanent failure)
[ 317.299703] talos.danny.cz kernel: nvme nvme1: failure state error detected, request disconnect
[ 317.302102] talos.danny.cz kernel: PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'disconnect'
[ 317.340517] talos.danny.cz dracut-initqueue[1582]: Linux/Root linear
[ 317.353416] talos.danny.cz dracut-initqueue[1607]: WARNING: Couldn't find device with uuid MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj.
[ 317.353877] talos.danny.cz dracut-initqueue[1607]: WARNING: VG Linux is missing PV MFWATY-ZceS-Qeub-HcO3-ZAv2-0Q2D-jDpFwj (last written to /dev/md0).
[ 317.353877] talos.danny.cz dracut-initqueue[1607]: Refusing activation of partial LV Linux/Root. Use '--activationmode partial' to override.
[ 317.302106] talos.danny.cz kernel: EEH: Finished:'error_detected(permanent failure)'
[ 317.664194] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Releasing PE
[ 317.665913] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Removing DMA window #0
[ 317.666648] talos.danny.cz kernel: pci 0030:0e : [PE# 1fa] Disabling 64-bit DMA bypass
More information about the Linux-nvme
mailing list