[PATCH v2 0/2] vmcoreinfo: Expose hardware error recovery statistics via sysfs
Breno Leitao
leitao at debian.org
Tue Feb 10 01:11:41 PST 2026
Hello Andrew,
On Mon, Feb 02, 2026 at 06:27:38AM -0800, Breno Leitao wrote:
> The kernel already tracks recoverable hardware errors (CPU, memory, PCI,
> CXL, etc.) in the hwerr_data array for vmcoreinfo crash dump analysis.
> However, this data is only accessible after a crash.
>
> This series adds a sysfs directory at /sys/kernel/hwerr_recovery_stats/ to
> expose these statistics at runtime, allowing monitoring tools to track
> hardware health without requiring a kernel crash.
>
> The directory contains one file per error subsystem:
> /sys/kernel/hwerr_recovery_stats/{cpu, memory, pci, cxl, others}
>
> Each file contains a single integer representing the error count.
>
> This is useful for:
> - Proactive detection of failing hardware components
> - Time-series tracking of recoverable errors
> - System health monitoring in cloud environments
Is there a chance this could be included in the 6.20 merge window?
Thanks,
--breno
More information about the linux-arm-kernel
mailing list