[PATCH 4/8] ras: aest: Add panic_on_ue module parameter
Umang Chheda
umang.chheda at oss.qualcomm.com
Mon May 11 23:51:22 PDT 2026
Hi Ruidong,
On 5/6/2026 1:36 PM, Ruidong Tian wrote:
>
>
> 在 2026/5/5 20:23, Umang Chheda 写道:
>> The driver unconditionally calls panic() whenever an unrecoverable,
>> uncontainable UE (UET_UC or UET_UEU) is detected. There is no way
>> for the user to suppress this behaviour, which makes it difficult to
>> test UE injection or to run in environments where a kernel panic on
>> every UE is undesirable.
>>
>> Add a module parameter `aest_panic_on_ue` When set to 0 the driver
>> logs the UE and continues instead of panicking.
>>
>> Usage:
>> # Boot time (kernel cmdline)
>> aest.aest_panic_on_ue=0
>>
>> # Runtime
>> echo 0 > /sys/module/aest/parameters/aest_panic_on_ue
>>
>> Signed-off-by: Umang Chheda <umang.chheda at oss.qualcomm.com>
>
> Hi Umang,
>
> Thanks for the patch.
>
> I understand that this parameter is intended to facilitate UE injection
> testing and to avoid kernel panics in certain environments. However, we
> need to carefully consider the potential risks.
>
> When a UC (Uncontainable Error) or UEU (Unrecoverable Error) occurs, the
> hardware state may be unpredictable, and data integrity cannot be
> guaranteed. Allowing the system to continue running instead of panicking
> in these scenarios could lead to silent data corruption or other
> unforeseen side effects, which poses a significant risk to system
> stability.
>
> For the sake of robustness and data safety, I do not believe we should
> expose an interface that allows users to suppress panic on such critical
> errors.
>
> If the goal is primarily to ease testing, I suggest handling this via
> local driver modifications in your test environment rather than
> upstreaming it as a configurable runtime option.
IMO, it would be useful to have a module parameter for this. In some
cases—outside of test scenarios—it’s necessary to avoid triggering a
kernel panic on UE errors.
Would it make sense to keep the default behavior as panic on UE, while
also providing a module parameter to disable it when needed? This way,
we can preserve the default safety behavior while avoiding the need for
local rebuilds just to change this setting.
Thanks,
Umang
>
> Best regards,
> Ruidong
>
>> ---
>> drivers/ras/aest/aest-core.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
>> index b4f4c975da1d..9ce782a66edf 100644
>> --- a/drivers/ras/aest/aest-core.c
>> +++ b/drivers/ras/aest/aest-core.c
>> @@ -22,6 +22,11 @@ DEFINE_PER_CPU(struct aest_device, percpu_adev);
>> #undef pr_fmt
>> #define pr_fmt(fmt) "AEST: " fmt
>> +static bool aest_panic_on_ue;
>> +module_param(aest_panic_on_ue, bool, 0644);
>> +MODULE_PARM_DESC(aest_panic_on_ue,
>> + "Panic on unrecoverable error: 0=off 1=on (default: 1)");
>> +
>> #ifdef CONFIG_DEBUG_FS
>> struct dentry *aest_debugfs;
>> #endif
>> @@ -342,9 +347,11 @@ void aest_proc_record(struct aest_record *record,
>> void *data, bool fake)
>> aest_record_info(
>> record,
>> "Simulated error! Skip panic due to fault
>> injection\n");
>> - else
>> + else if (aest_panic_on_ue)
>> aest_panic(record, ®s,
>> "AEST: unrecoverable error encountered");
>> + else
>> + aest_record_err(record, "UE detected, panic suppressed\n");
>> }
>> aest_log(record, ®s);
>>
>
More information about the linux-arm-kernel
mailing list