Advice appreciated - attempting data recovery on Intel Optane H10 (CSTS=0x0?)
Jeff Johnson
jeff.johnson at aeoncomputing.com
Fri Dec 8 18:12:28 PST 2023
Greetings,
Seeking wisdom, many thanks to anyone who stops to read and more if
you have advice.
I'm attempting data recovery on a odd little SSD Intel built called
the H10. It is an M.2 NVMe SSD with two controllers and two different
flash devices. The M.2 is bifurcated from x4 into x2x2 with a
different controller and flash device behind each x2.
I was able to access the low side x2 (lanes 0-1) that connected to the
32GB 3dXpoint flash device. It was a cache so not much valuable data.
I had to do some lab gymnastics to access the high side x2 but I was
successful. Now I can see the other controller and the link speed and
width are fine but it won't initialize. Basically I had to disable
PCIe spread spectrum in the system BIOS and use kapton tape to mask
out PCIe xmt&rcv lanes for lanes 0 & 1 so the high side x2 lanes 2-3
were the only lanes the root complex saw and connected. Crazy I know
but it worked.
Rocky 9.3, 5.14.0-362.8.1.el9_3.x86_64
[ 12.077360] nvme nvme0: pci function 0000:06:00.0
[ 72.080442] nvme nvme0: Device not ready; aborting initialisation, CSTS=0x0
Controller appears in lspci
lspci -s 06:00.0
06:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME
SSD H10 with Solid State Storage [Teton Glacier] (rev 03) (prog-if 02
[NVM Express])
Verbose output shows a good connection. LnkSta: Speed 8GT/s (ok), Width x2 (ok)
# lspci -s 06:00.0 -vvv
06:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME
SSD H10 with Solid State Storage [Teton Glacier] (rev 03) (prog-if 02
[NVM Express])
Subsystem: Intel Corporation Device 8410
Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 79
NUMA node: 0
Region 0: Memory at fb500000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/8 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM not supported
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x2 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+
EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [b0] MSI-X: Enable- Count=16 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00002100
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [158 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [178 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [180 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=100us PortTPowerOnTime=3100us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Kernel modules: nvme
Even though the controller is talking on PCIe Intel's SSD tools like
isdct and intelmas do not see the controller when run.
I'm not going to use this device long term, just trying to access the
QLC flash device behind the controller to rescue data for someone
desperate. Not a paying gig, being kind and charitable during the
holidays.
Is there some vendor unique magic required to wake this up or activate
the controller to flash path?
Not only do Intel's SSD tools not see it, nvme-cli doesn't see it
either. It is as if the device is alive but not reporting as storage
or a block device.
Any advice is greatly appreciated.
--Jeff
More information about the Linux-nvme
mailing list