Are AER corrected errors worrying?
Samuel Thibault
samuel.thibault at ens-lyon.org
Mon Jan 4 15:12:47 EST 2021
Hello,
Vidya Sagar wrote:
> Since this is a laptop, I'm suspecting that ASPM states might have
> been enabled which could be causing these errors.
Keith Busch, le lun. 04 janv. 2021 10:44:35 -0800, a ecrit:
> Sometimes these types of errors occur from low power settings, so you
> can try disabling the automatic management of these (assuming the
> hardware supports it). To disable nvme specific power state transitions,
> the kernel parameter is "nvme_core.default_ps_max_latency_us=0".
I have tried to add it, and this one line changed in lspci -vv:
02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN750 / PC SN730 NVMe SSD (prog-if 02 [NVM Express])
[...]
Capabilities: [c0] Express (v2) Endpoint, MSI 00
[...]
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
turned to
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
that last value happens to be what I was seeing for that line with the
manufacturer-provided ubuntu linux kernel.
So far (30m uptime) no corrected error report, I'll watch in the coming
hours/days to see if that avoided the issue. I wasn't able to trigger
such corrected errors by loading the machine, so possibly that's indeed
the converse that I should have been trying: letting it go low power :)
> PCI also has automatic link power savings that you can disable with
> parameter "pcie_aspm=off".
I'll try that if I still see errors with the nvme_core parameter.
Samuel
More information about the Linux-nvme
mailing list