nvme smart-log intermittently corrupt from family of SSDs
Nick Neumann
nick at pcpartpicker.com
Thu Sep 22 08:37:29 PDT 2022
I noticed something odd with the smart data from various sized HP
FX900 Pro SSDs. Intermittently, after hours of use, host_writes would
be 0 (but data_units_written would not). After experimenting, I'm
seeing that intermittently, when running either "smartctl -x" or "nvme
smart-log", the data coming back is, uh, junk?
sudo nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 48 C
available_spare : 100%
available_spare_threshold : 25%
percentage_used : 0%
data_units_read : 793353
data_units_written : 4938926
host_read_commands : 18675956
host_write_commands : 40381344
controller_busy_time : 0
power_cycles : 9
power_on_hours : 15
unsafe_shutdowns : 1
media_errors : 0
num_err_log_entries : 217026295083295649698117473405925064704
Warning Temperature Time : 1323299686
Critical Composite Temperature Time : 4108885430
Temperature Sensor 1 : 38041 C
Temperature Sensor 2 : 55225 C
Temperature Sensor 3 : 23326 C
Temperature Sensor 4 : 38685 C
Temperature Sensor 5 : 57125 C
Temperature Sensor 6 : 45486 C
Temperature Sensor 7 : 59856 C
Temperature Sensor 8 : 2107 C
Thermal Management T1 Trans Count : 1702871148
Thermal Management T2 Trans Count : 312071619
Thermal Management T1 Total Time : 406823074
Thermal Management T2 Total Time : 468107429
Even odder, when the host_write_commands was 0, the rest of the data
was sane. So the incorrectness is not always obvious...
Any recommendations or thoughts on dealing with this? Or am I just out
of luck when it comes to relying on anything from these drives' smart
logs?
More information about the Linux-nvme
mailing list