[PATCH v5 00/17] ARM Error Source Table V2 Support

Ruidong Tian tianruidong at linux.alibaba.com
Fri Jan 9 04:26:27 PST 2026



在 2026/1/9 18:34, Borislav Petkov 写道:
> On Mon, Jan 05, 2026 at 05:12:25PM +0800, Ruidong Tian wrote:
>>> What is a "RAS node"?
>> A RAS node is the hardware interface for error reporting and control,
>> consisting of one or more register sets (a collection of RAS records). It is
>> responsible for error logging and interrupt signaling[0].
> 
> OMG, one more meaning for the word "node". Because we're not ambiguous enough.
> 
> /facepalm
> 
>> A single hardware component can feature multiple RAS nodes. For example, a
>> memory controller is treated as a "RAS device", where each memory channel
>> has its own RAS node. Interrupts generated by these nodes are typically
>> aggregated into a single interrupt line managed at the RAS device level.
> 
> Nomenclaturial tragedy, I'd say.
> 
>> Comparison with x86 MCA:
>>
>> RAS record ≈ MCA bank.
>> RAS node ≈ A set of MCA banks + CMCI on a core.
>>
>> The key difference lies in uncore handling: x86 typically maps uncore errors
>> (like those from a memory controller) into core-based MCA banks. In
>> contrast, ARM requires uncore components to provide their own standalone RAS
>> nodes. When a component requires multiple such nodes, they are grouped and
>> managed as a "RAS device" in AEST driver.
>>
>> [0]: https://developer.arm.com/documentation/ihi0100/latest
> 
> Yah, thanks for explaining.
> 
>>> The ATL is very AMD-specific. What does "conceptually similar" mean exactly?
>> By "conceptually similar," I mean that both ARM and AMD share the same
>> functional requirement: translating between a System Physical Address (SPA)
>> and a device-specific address (like a DRAM address) for RAS purposes.
>>
>> The goal here is not to share the hardware-specific translation logic, but
>> to provide a unified interface (an abstraction layer). The actual
>> implementation of the translation remains entirely architecture-specific.
> 
> And why do we need an arch-overlapping unified interface?
> 
> You can just as well have aest_convert_la_to_spa() and none of that "unifying"
> churn.
> 
You're right, that would be much cleaner. I was trying too hard to keep 
the interface unified across architectures. I'll drop the unified 
interface and use a direct helper instead in next version. Thanks for 
the feedback!



More information about the linux-arm-kernel mailing list