Are AER corrected errors worrying?

Keith Busch kbusch at kernel.org
Mon Jan 4 13:44:35 EST 2021


On Fri, Jan 01, 2021 at 11:40:28PM +0100, Samuel Thibault wrote:
> Hello,
> 
> Our lab has bought a new Dell Latitude 5410 laptop, I installed debian
> bullseye on it with kernel 5.9.0-5-amd64, but it is spitting these
> errors now and then (sometimes a dozen per a minute):
> 
> Jan  1 23:30:53 begin kernel: [   46.675818] pcieport 0000:00:1d.0: AER: Corrected error received: 0000:02:00.0
> Jan  1 23:30:53 begin kernel: [   46.675933] nvme 0000:02:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
> Jan  1 23:30:53 begin kernel: [   46.676048] nvme 0000:02:00.0:   device [15b7:5006] error status/mask=00000001/0000e000
> Jan  1 23:30:53 begin kernel: [   46.676140] nvme 0000:02:00.0:    [ 0] RxErr
> 
> Since it's corrected it's not actually an issue, but how worrying is it
> to see such errors on new hardware? Documentation/PCI/pcieaer-howto.rst
> is not commenting whether we are really supposed to see some of them. I
> see forums telling to use pci=noaer to stop the error logging, but is
> that really something to do?

Additional work has to happen to correct a receiver error, so it's
possible you're getting degraded performance. You may not notice worse
performance if these are infrequent enough, though.

Sometimes these types of errors occur from low power settings, so you
can try disabling the automatic management of these (assuming the
hardware supports it). To disable nvme specific power state transitions,
the kernel parameter is "nvme_core.default_ps_max_latency_us=0". PCI
also has automatic link power savings that you can disable with
parameter "pcie_aspm=off". It might be worth seeing if either of those
changes your observation.



More information about the Linux-nvme mailing list