EDAC driver for ARMv8 L1/L2 cache

York Sun york.sun at nxp.com
Mon Jan 15 08:19:09 PST 2018


On 01/15/2018 06:32 AM, Borislav Petkov wrote:
> On Mon, Jan 15, 2018 at 02:21:54PM +0000, Mark Rutland wrote:
>> I'm not sure it's possible to cover all potential EDAC implementations
>> behind the same driver.
>>
>> If we need these drivers, they should be <cpuname>_edac or <soc>_edac.
> 
> Yuck, I wanted to avoid that...
> 
> Oh well, can we at least share the barebones design and exchange
> registers/DT only or is it more complicated than that?
> 

I have different plan on the driver. Since I don't get interrupt on
correctable errors, my thinking is to use dynamic polling interval. With
more correctable errors, the polling interval is decreased to a
threshold, then further action needs to be taken (at least I would raise
an error message). The idea is uncorrectable error proceeds from
increasing correctable errors. I would use per CPU data structure so we
know which core has increasing errors. For embedded system, we may
shutdown the core(s) with error to protect the system from critical
failure. Similarly but differently, L2 cache is shared on the same
cluster. We may have to shutdown the whole cluster if we have excessive
correctable errors. For server, it may simply shutdown. Of course the
decision is not made by the driver, but by RAS or other monitoring policy.

Any comment is welcomed.

York



More information about the linux-arm-kernel mailing list