[PATCH 0/2] ARM Error Source Table V1 Support
Ruidong Tian
tianruidong at linux.alibaba.com
Mon Mar 4 03:15:15 PST 2024
This series adds support for the ARM Error Source Table (AEST) based on
the 1.1 version of ACPI for the Armv8 RAS Extensions [0].
The Arm Error Source Table (AEST) enable kernel-first handling of errors
in a system that supports the Armv8 RAS extensions. Hardware errors will
trigger a RAS interrupt to kernel, kernel scan all AEST node to fine
error node which occur error in irq context and use a workqueue to log
this hardware errors.
I have tested this series on PTG Yitian710 SOC. Both corrected and
uncorrected errors were tested to verify the non-fatal vs fatal
scenarios.
Future work:
1. UE trigger memory_failure other than panic.
2. Add CE storm mitigation.
3. Support AEST V2.
This series is based on Tyler Baicar's patches [1], which do not have v2
sended to mail list yet. Change from origin patch:
1. Add a genpool to collect all AEST error, and log them in a workqueue
other than in irq context.
2. Just use the same one aest_proc function for system register interface
and MMIO interface.
3. Reconstruct some structures and functions to make it more clear.
4. Accept all comments in Tyler Baicar's mail list.
[0]: https://developer.arm.com/documentation/den0085/0101/
[1]: https://lore.kernel.org/all/20211124170708.3874-1-baicar@os.amperecomputing.com/
Tyler Baicar (2):
ACPI/AEST: Initial AEST driver
trace, ras: add ARM RAS extension trace event
MAINTAINERS | 11 +
arch/arm64/include/asm/ras.h | 38 ++
drivers/acpi/arm64/Kconfig | 10 +
drivers/acpi/arm64/Makefile | 1 +
drivers/acpi/arm64/aest.c | 728 +++++++++++++++++++++++++++++++++++
include/linux/acpi_aest.h | 91 +++++
include/linux/cpuhotplug.h | 1 +
include/ras/ras_event.h | 55 +++
8 files changed, 935 insertions(+)
create mode 100644 arch/arm64/include/asm/ras.h
create mode 100644 drivers/acpi/arm64/aest.c
create mode 100644 include/linux/acpi_aest.h
--
2.33.1
More information about the linux-arm-kernel
mailing list