[PATCH 1/2] firmware: smccc: add timeout, touch wdt

Andre Przywara andre.przywara at arm.com
Tue Feb 24 02:58:12 PST 2026


Hi Veda,

On 2/10/26 23:40, Vedashree Vidwans wrote:
> Enhance PRIME/ACTIVATION functions to touch watchdog and implement
> timeout mechanism. This update ensures that any potential hangs are
> detected promptly and that the LFA process is allocated sufficient
> execution time before the watchdog timer expires. These changes improve
> overall system reliability by reducing the risk of undetected process
> stalls and unexpected watchdog resets.

Many thanks for that, I think it's a very good idea to take care of the 
watchdog and to avoid an infinite loop in the AGAIN case.
I have some comments about some details below ....

> Signed-off-by: Vedashree Vidwans <vvidwans at nvidia.com>
> ---
>   drivers/firmware/smccc/lfa_fw.c | 40 +++++++++++++++++++++++++++++++++
>   1 file changed, 40 insertions(+)
> 
> diff --git a/drivers/firmware/smccc/lfa_fw.c b/drivers/firmware/smccc/lfa_fw.c
> index da6b54fe1685..b0ace6fc8dac 100644
> --- a/drivers/firmware/smccc/lfa_fw.c
> +++ b/drivers/firmware/smccc/lfa_fw.c
> @@ -17,6 +17,9 @@
>   #include <linux/array_size.h>
>   #include <linux/list.h>
>   #include <linux/mutex.h>
> +#include <linux/nmi.h>
> +#include <linux/ktime.h>
> +#include <linux/delay.h>
>   
>   #undef pr_fmt
>   #define pr_fmt(fmt) "Arm LFA: " fmt
> @@ -37,6 +40,14 @@
>   #define LFA_PRIME_CALL_AGAIN		BIT(0)
>   #define LFA_ACTIVATE_CALL_AGAIN		BIT(0)
>   
> +/* Prime loop limits, TODO: tune after testing */
> +#define LFA_PRIME_BUDGET_US		30000000	/* 30s cap */
> +#define LFA_PRIME_POLL_DELAY_US		10		/* 10us between polls */
> +
> +/* Activation loop limits, TODO: tune after testing */
> +#define LFA_ACTIVATE_BUDGET_US		20000000	/* 20s cap */
> +#define LFA_ACTIVATE_POLL_DELAY_US	10		/* 10us between polls */
> +
>   /* LFA return values */
>   #define LFA_SUCCESS			0
>   #define LFA_NOT_SUPPORTED		1
> @@ -219,6 +230,7 @@ static int call_lfa_activate(void *data)
>   	struct image_props *attrs = data;
>   	struct arm_smccc_1_2_regs args = { 0 };
>   	struct arm_smccc_1_2_regs res = { 0 };
> +	ktime_t end = ktime_add_us(ktime_get(), LFA_ACTIVATE_BUDGET_US);
>   
>   	args.a0 = LFA_1_0_FN_ACTIVATE;
>   	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
> @@ -232,6 +244,8 @@ static int call_lfa_activate(void *data)
>   	args.a2 = !(attrs->cpu_rendezvous_forced || attrs->cpu_rendezvous);
>   
>   	for (;;) {
> +		/* Touch watchdog, ACTIVATE shouldn't take longer than watchdog_thresh */
> +		touch_nmi_watchdog();
>   		arm_smccc_1_2_invoke(&args, &res);
>   
>   		if ((long)res.a0 < 0) {
> @@ -241,6 +255,15 @@ static int call_lfa_activate(void *data)
>   		}
>   		if (!(res.a1 & LFA_ACTIVATE_CALL_AGAIN))
>   			break; /* ACTIVATE successful */
> +
> +		/* SMC returned with call_again flag set */
> +		if (ktime_before(ktime_get(), end)) {
> +			udelay(LFA_ACTIVATE_POLL_DELAY_US);

I don't think we should wait here at all, and definitely not with 
udelay: https://docs.kernel.org/timers/delay_sleep_functions.html

Instead we should move the "call again" (and timeout) mechanism out of 
this function, into activate_fw_image(), so that we exit the 
stop_machine(). Otherwise we would still block everything. Doing it 
there, where we should be preemptible, would give the kernel a chance to 
do some housekeeping. If there is nothing for the kernel to do, then I 
think it's fine to immediately call lfa_activate() again, after a 
cond_resched(), for instance.

> +			continue;
> +		}
> +
> +		pr_err("ACTIVATE for image %s timed out", attrs->image_name);
> +		return -ETIMEDOUT;
>   	}
>   
>   	return res.a0;
> @@ -290,6 +313,7 @@ static int prime_fw_image(struct image_props *attrs)
>   {
>   	struct arm_smccc_1_2_regs args = { 0 };
>   	struct arm_smccc_1_2_regs res = { 0 };
> +	ktime_t end = ktime_add_us(ktime_get(), LFA_PRIME_BUDGET_US);
>   	int ret;
>   
>   	mutex_lock(&lfa_lock);
> @@ -317,6 +341,8 @@ static int prime_fw_image(struct image_props *attrs)
>   	args.a0 = LFA_1_0_FN_PRIME;
>   	args.a1 = attrs->fw_seq_id; /* fw_seq_id under consideration */
>   	for (;;) {
> +		/* Touch watchdog, PRIME shouldn't take longer than watchdog_thresh */
> +		touch_nmi_watchdog();
>   		arm_smccc_1_2_invoke(&args, &res);
>   
>   		if ((long)res.a0 < 0) {
> @@ -328,6 +354,20 @@ static int prime_fw_image(struct image_props *attrs)
>   		}
>   		if (!(res.a1 & LFA_PRIME_CALL_AGAIN))
>   			break; /* PRIME successful */
> +
> +		/* SMC returned with call_again flag set */
> +		if (ktime_before(ktime_get(), end)) {
> +			udelay(LFA_PRIME_POLL_DELAY_US);

same comment here, please no udelay().
This should also avoid the discussion about the exact values of the 
sleep periods.
I'd just have one generous timeout (a few seconds, basically what your 
BUDGET values do above), to avoid looping forever in case of a firmware 
bug, for instance.

Cheers,
Andre

> +			continue;
> +		}
> +
> +		pr_err("LFA_PRIME for image %s timed out", attrs->image_name);
> +		mutex_unlock(&lfa_lock);
> +
> +		ret = lfa_cancel(attrs);
> +		if (ret != 0)
> +			return ret;
> +		return -ETIMEDOUT;
>   	}
>   
>   	mutex_unlock(&lfa_lock);




More information about the linux-arm-kernel mailing list