[PATCH] NVMe: Log Sense Temperature doesn't handle negative values

Jon Derrick jonathan.derrick at intel.com
Fri Oct 9 13:41:08 PDT 2015


On Fri, Oct 09, 2015 at 02:15:33PM -0600, Jon Derrick wrote:
> > >To meet the SCSI Command Spec (T10 SPC-4 r37), the log sense temperature cannot
> > >be negative.   From the spec:
> > >
> > >The TEMPERATURE field indicates the temperature of the SCSI target device in
> > >degrees Celsius at the time the LOG SENSE command is performed. Temperatures
> > >equal to or less than zero degrees Celsius shall cause the TEMPERATURE field
> > >to be set to zero. If the device server is unable to detect a valid
> > >temperature because of a sensor failure or other condition, then the
> > >TEMPERATURE field shall be set to FFh. The temperature should be reported
> > >with an accuracy of plus or minus three Celsius degrees while the SCSI target
> > >device is operating at a steady state within its environmental limits.
> > >
> > >With the current code, a value of -2 C will be shown as 254 C since it was not
> > >written to handle negative vales.  This same issue also exists in
> > >nvme_trans_log_info_exceptions.
> > 
> > [snip]
> > 
> > >@@ -1097,7 +1101,7 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, struct sg_io_hdr *hdr,
> > > 	dma_addr_t dma_addr;
> > > 	void *mem;
> > > 	u32 feature_resp;
> > >-	u8 temp_c_cur, temp_c_thresh;
> > >+	s8 temp_c_cur, temp_c_thresh;
> > > 	u16 temp_k;
> > > 	log_response = kzalloc(LOG_TEMP_PAGE_LENGTH, GFP_KERNEL);
> > >@@ -1129,6 +1133,10 @@ static int nvme_trans_log_temperature(struct nvme_ns *ns, struct sg_io_hdr *hdr,
> > > 		temp_k = (smart_log->temperature[1] << 8) +
> > > 				(smart_log->temperature[0]);
> > > 		temp_c_cur = temp_k - KELVIN_TEMP_FACTOR;
> > >+		/* if temp_c_cur is negative,          */
> > >+		/* set to 0 to meet Scsi Command Spec  */
> > >+		if (temp_c_cur < 0)
> > >+			temp_c_cur = 0;
> > 
> > Interesting, < 0C is colder than I might have expected. What if they
> > can get hotter than expected, like 128C? I don't think we want to report
> > that as 0.
> >
> 
> It's unfortunate the spec is written that way. Aerospace silicon is often rated for -55/-40C to +125C junction temp +/- a few %. Thats a big negative range missing if it has to be 0. I would imagine over 128C is not meaningful so we would want to report that as 128C
> 

Thinking about it again, I would much rather see the u8 being used so we get the full range.

something like this should do:
temp_c_cur = temp_k < KELVIN_TEMP_FACTOR ? 0 : temp_k - KELVIN_TEMP_FACTOR;




More information about the Linux-nvme mailing list