Linux 4.9.8 + NVMe CiB Issue

Marc Smith marc.smith at mcc.edu
Fri Mar 31 06:26:32 PDT 2017


Just an update in case anyone else stumbles upon this thread with the
same issue: Intel reviewed the logs from the drives and determined the
NVMe drives are hitting their maximum operating temperature and then
shutting down.

--Marc

On Thu, Mar 9, 2017 at 7:27 PM, Keith Busch <keith.busch at intel.com> wrote:
> On Thu, Mar 09, 2017 at 03:25:01PM -0500, Marc Smith wrote:
>> Hi,
>>
>> We opened a support case with Intel (# 02641251) but we aren't making
>> much progress... they looked at log files from /var/log/ on our system
>> and now seem to be blaming warning/error log lines from
>> Corosync/Pacemaker on the issue. =)
>>
>> We're still able to reproduce this issue with high load... the NVMe
>> drives "drop out", randomly, not always the same drive, when pushing
>> I/O across all of our NVMe drives.
>>
>> We don't mind to keep pushing on the Intel NVMe hardware support
>> front, but looking for confirmation that its believed based on the
>> kernel log messages posted, that this is an issue with the NVMe drives
>> themselves (eg, firmware)?
>
> This genuinely looks like the drives have stopped responding and needs
> to be directed to the drive vendor.



More information about the Linux-nvme mailing list