PROBLEM: XPG Gammix S70 Blade PCIe Gen 4 NVMe drive unusable in Linux
Warren Chartier
icebalm at icebalm.com
Sun Feb 6 13:59:56 PST 2022
Summary: XPG Gammix S70 Blade PCIe Gen 4 NVMe drive unusable in Linux
Full Description:
XPG Gammix S70 Blade 1TB PCIe Gen 4.0 NVMe drive is detected by the Linux kernel however when block operations are performed on it these errors are generated:
[ 3.958786] nvme 0000:0e:00.0: invalid VPD tag 0xff (size 65535) at offset 7
[ 71.726420] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
[ 71.793517] block nvme1n1: no usable path - requeuing I/O
[ 71.793523] block nvme1n1: no usable path - requeuing I/O
[ 71.793525] block nvme1n1: no usable path - requeuing I/O
[ 71.793527] block nvme1n1: no usable path - requeuing I/O
[ 71.793528] block nvme1n1: no usable path - requeuing I/O
[ 71.816389] nvme 0000:0e:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 71.816527] nvme nvme1: Removing after probe failure status: -19
[ 71.856406] block nvme1n1: no available path - failing I/O
[ 71.856425] block nvme1n1: no available path - failing I/O
[ 71.856429] block nvme1n1: no available path - failing I/O
[ 71.856432] block nvme1n1: no available path - failing I/O
[ 71.856435] block nvme1n1: no available path - failing I/O
Some block operations seem to succeed since the Linux kernel looks to be able to read the partition table from the drive at least:
[ 0.672616] nvme nvme1: pci function 0000:0e:00.0
[ 0.680459] nvme nvme1: 32/0/0 default/read/poll queues
[ 0.682463] nvme1n1: p1 p2 p3 p4
However any kind of user operation such as running a partition editor, attempting to mount a filesystem, etc. will cause the errors and the drive will not work.
This drive works perfectly fine in Windows 10 on the same system. The drive also works fine in a Playstation 5.
Keywords: nvme kernel
Kernel version: Linux version 5.16.5-arch1-1 (linux at archlinux) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Tue, 01 Feb 2022 21:42:50 +0000
Software:
GNU C 11.1.0
GNU Make 4.3
Binutils 2.36.1
Util-linux 2.37.3
Mount 2.37.3
Module-init-tools 29
E2fsprogs 1.46.5
Xfsprogs 5.14.2
PPP 2.4.9
Bison 3.8.2
Flex 2.6.4
Linux C++ Library 6.0.29
Linux C Library 2.33
Dynamic linker (ldd) 2.33
Procps 3.3.17
Kbd 2.4.0
Console-tools 2.4.0
Sh-utils 9.0
Udev 250
Modules Loaded acpi_cpufreq aesni_intel af_alg algif_hash algif_skcipher be2net blake2b_generic bluetooth bnep bpf_preload bridge btbcm btintel btrfs btrtl btusb ccp cdrom cfg80211 cmac crc16 crc32c_generic crc32c_intel crc32_pclmul crct10dif_pclmul cryptd crypto_simd crypto_user dca dm_mod ecdh_generic edac_mce_amd ext4 fat fuse ghash_clmulni_intel hfs hfsplus i2c_piix4 igb intel_rapl_common intel_rapl_msr ip6table_filter ip6_tables iptable_filter ip_tables irqbypass iwlmvm iwlwifi jbd2 jfs joydev k10temp kvm kvm_amd libarc4 libcrc32c llc mac80211 mac_hid mbcache mc minix mousedev msdos mxm_wmi nls_iso8859_1 nvidia nvidia_drm nvidia_modeset nvidia_uvm pcspkr pinctrl_amd raid6_pq rapl rfcomm rfkill rng_core sg snd snd_hda_codec snd_hda_codec_hdmi snd_hda_core snd_hda_intel snd_hrtimer snd_hwdep snd_intel_dspcfg snd_intel_sdw_acpi snd_pcm snd_rawmidi snd_seq snd_seq_device snd_seq_dummy snd_timer snd_usb_audio snd_usbmidi_lib soundcore sp5100_tco stp ufs usbhid uvcvideo vfat vfio vfio_iommu_type1 vfio_pci vfio_pci_core vfio_virqfd videobuf2_common videobuf2_memops videobuf2_v4l2 videobuf2_vmalloc videodev wmi wmi_bmof xfs xhci_pci xhci_pci_renesas xor x_tables
Processor Information:
processor : 0
vendor_id : AuthenticAMD
cpu family : 23
model : 113
model name : AMD Ryzen 7 3800X 8-Core Processor
stepping : 0
microcode : 0x8701021
cpu MHz : 2200.000
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
bogomips : 7803.32
TLB size : 3072 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 43 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
lspci -vvv for the offending NVMe drive after cold boot before trying to access it:
0e:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. Device 5236 (rev 01) (prog-if 02 [NVM Express])
Subsystem: Device 1dbe:5236
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 45
NUMA node: 0
IOMMU group: 29
Region 0: Memory at fce30000 (64-bit, non-prefetchable) [size=16K]
Region 4: Memory at fce20000 (64-bit, non-prefetchable) [size=64K]
Expansion ROM at fce00000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x4 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: Upstream Port
Capabilities: [b0] MSI-X: Enable+ Count=66 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [d0] Vital Product Data
Product Name: ABCD
End
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [158 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [178 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [19c v1] Lane Margining at the Receiver <?>
Capabilities: [1b4 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
IOVSta: Migration-
Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00
VF offset: 256, stride: 256, Device ID: 5208
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 00000000fce34000 (64-bit, non-prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [1f4 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Capabilities: [1fc v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=32768ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [20c v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [244 v1] Data Link Feature <?>
Kernel driver in use: nvme
After trying to access it and receiving errors:
0e:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. Device 5236 (rev ff) (prog-if ff)
!!! Unknown header type 7f
More information about the Linux-nvme
mailing list