Are AER corrected errors worrying?

Samuel Thibault samuel.thibault at ens-lyon.org
Mon Jan 4 15:12:47 EST 2021


Hello,

Vidya Sagar wrote:
> Since this is a laptop, I'm suspecting that ASPM states might have
> been enabled which could be causing these errors.

Keith Busch, le lun. 04 janv. 2021 10:44:35 -0800, a ecrit:
> Sometimes these types of errors occur from low power settings, so you
> can try disabling the automatic management of these (assuming the
> hardware supports it). To disable nvme specific power state transitions,
> the kernel parameter is "nvme_core.default_ps_max_latency_us=0".

I have tried to add it, and this one line changed in lspci -vv:

02:00.0 Non-Volatile memory controller: Sandisk Corp WD Black SN750 / PC SN730 NVMe SSD (prog-if 02 [NVM Express])
[...]
	Capabilities: [c0] Express (v2) Endpoint, MSI 00                                               
[...]
		DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
turned to
		DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-

that last value happens to be what I was seeing for that line with the
manufacturer-provided ubuntu linux kernel.

So far (30m uptime) no corrected error report, I'll watch in the coming
hours/days to see if that avoided the issue. I wasn't able to trigger
such corrected errors by loading the machine, so possibly that's indeed
the converse that I should have been trying: letting it go low power :)

> PCI also has automatic link power savings that you can disable with
> parameter "pcie_aspm=off".

I'll try that if I still see errors with the nvme_core parameter.

Samuel



More information about the Linux-nvme mailing list