Arm RAS EDAC & AEST table driver
Konrad Dybcio
konrad.dybcio at linaro.org
Sat Dec 10 04:24:06 PST 2022
Hi!
I've been working on an EDAC driver utilizing Arm RAS for
some time now. It's based on the (many) previous efforts by
Qualcomm and Ampere folks. In its current state, it supports
system register-based operation and error detection on CPU
caches on DT platforms. I've started adding code for ACPI, but
that has not been validated at all yet (other than compile
testing), as I don't have any suitable hardware (RAS extensions
& some kind of UEFI & a proper AEST table) to run it on.
I made it by taking the Qualcomm Kryo (fancy Cortex)-specific
driver that was written just for CPU cache error detection
and generalizing it to the point where (I think) it's
ready to handle all the configurations that it should as a
generic impl (modulo IMPLEMENTATION DEFINED/vendor screw-up,
of course..) when support for them is added.
I've tried to make it extensible, so that support for other
error sources (SMMU/GIC/vendor-specific/RAM/CPU_TLB/
CPU_"generic") and implementations using MMIO registers can
be added later on (again, I have no means of testing most of
these things).
Could I ask you folks for a general/first-impressions review
of that said driver? Smoke testing would also be appreciated..
As far as I'm aware, generating errors yourself for testing
purposes is only possible if the AEST node interface is a
MMIO one, as ERRnPFGCDN seems to only be accessible that way..
Either my boards are super stable and never throw errors or
the driver simply doesn't detect them, hard to tell :)
It probably leaks memory like crazy and there are some obvious
style issues, but please take a look at the general structure
and share your opinions, especially if(when) you find errors!
You can consider this a v(0.01), I suppose..
Available over at [1], with an example dt part for QC SM8250.
Konrad
[1] https://github.com/konradybcio-work/linux/commits/ras_edac
More information about the linux-arm-kernel
mailing list