[PATCH v3] arm64: sdei: abort running SDEI handlers during crash

James Morse james.morse at arm.com
Fri Jun 23 07:48:16 PDT 2023


Hi Scott,

On 07/06/2023 20:55, D Scott Phillips wrote:
> Interrupts are blocked in SDEI context, per the SDEI spec: "The client
> interrupts cannot preempt the event handler." If we crashed in the SDEI
> handler-running context (as with ACPI's AGDI) then we need to clean up the
> SDEI state before proceeding to the crash kernel so that the crash kernel
> can have working interrupts.
> 
> Track the active SDEI handler per-cpu so that we can COMPLETE_AND_RESUME
> the handler, discarding the interrupted context.

I still argue this is a bug in the firmware. The text you indicate was intended to mean
'set PSTATE.I'. The whole 'GIC abstraction' use-case was added to SDEI quite late. You'll
note linux doesn't support any of that. The normal-world OS was never supposed to have
to guess whether


Regardless, I've not managed to convince the firmware people to fix this.


> diff --git a/arch/arm64/include/asm/sdei.h b/arch/arm64/include/asm/sdei.h
> index 4292d9bafb9d..dc2f038a61ef 100644
> --- a/arch/arm64/include/asm/sdei.h
> +++ b/arch/arm64/include/asm/sdei.h
> @@ -17,6 +17,11 @@
>  
>  #include <asm/virt.h>
>  
> +#ifdef CONFIG_ARM_SDE_INTERFACE
> +DECLARE_PER_CPU(struct sdei_registered_event *, sdei_active_normal_event);
> +DECLARE_PER_CPU(struct sdei_registered_event *, sdei_active_critical_event);
> +#endif

Please drop these #ifdef guards, they prevent using IS_ENABLED() to let the compiler check
the code, but dead-code eliminate it. This makes it harder for the CI to catch any issues
without having to 'get lucky' with the Kconfig options.


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ab2a6e33c052..e49f72b5fb63 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -1007,6 +1007,12 @@ SYM_CODE_START(__sdei_asm_handler)
>  	ldrb	w4, [x19, #SDEI_EVENT_PRIORITY]
>  #endif
>  
> +	cbnz	w4, 1f

Is this meant to be considering SDEI_EVENT_PRIORITY? That is #ifdef'd for VMAP_STACK/SCS.
Without those options set, you're checking to the stack pointer here.
You probably need to drop the #ifdefs.


> +	adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
> +	b	2f
> +1:	adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
> +2:	str	x19, [x5]

Almost every assembly block in here has a comment describing what its doing.
|	/* Store the registered-event for crash_smp_send_stop() */


>  #ifdef CONFIG_VMAP_STACK
>  	/*
>  	 * entry.S may have been using sp as a scratch register, find whether
> @@ -1072,6 +1078,13 @@ SYM_CODE_START(__sdei_asm_handler)
>  
>  	ldr_l	x2, sdei_exit_mode


|	/* Clear the registered-event seen by crash_smp_send_stop() */

> +	ldrb	w3, [x4, #SDEI_EVENT_PRIORITY]
> +	cbnz	w3, 1f
> +	adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
> +	b	2f
> +1:	adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
> +2:	str	xzr, [x5]
> +
>  alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
>  	sdei_handler_exit exit_mode=x2
>  alternative_else_nop_endif


> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4e8327264255..ea1595b51590 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c

> @@ -1069,7 +1067,28 @@ void crash_smp_send_stop(void)
>  		pr_warn("SMP: failed to stop secondary CPUs %*pbl\n",
>  			cpumask_pr_args(&mask));
>  
> +skip_ipi:
>  	sdei_mask_local_cpu();
> +
> +#ifdef CONFIG_ARM_SDE_INTERFACE

Please make this a C "if(IS_ENABLED(CONFIG_ARM_SDE_INTERFACE)) {", this lets the compiler
check this is all valid, before dead-code eliminating it if the Kconfig option isn't
selected. This lets the CI systems check this code regardless of the Kconfig option.


> +	/*
> +	 * If the crash happened in an SDEI event handler then we need to
> +	 * finish the handler with the firmware so that we can have working
> +	 * interrupts in the crash kernel.
> +	 */
> +	if (__this_cpu_read(sdei_active_critical_event)) {
> +		pr_warn("SMP: stopped CPUs from SDEI critical event handler "
> +			"context, attempting to finish handler.\n");
> +		sdei_handler_abort();
> +		__this_cpu_write(sdei_active_critical_event, NULL);
> +	}
> +	if (__this_cpu_read(sdei_active_normal_event)) {
> +		pr_warn("SMP: stopped CPUs from SDEI normal event handler "
> +			"context, attempting to finish handler.\n");
> +		sdei_handler_abort();
> +		__this_cpu_write(sdei_active_normal_event, NULL);
> +	}
> +#endif

You might want to wrap this up as a sdei_handler_abort(), just to hide the ugly details
from here. The asm version would then need a double-underscore prefix.


>  }


With the entry.S #ifdef issue fixed:
Reviewed-by: James Morse <james.morse at arm.com>


Thanks,

James



More information about the linux-arm-kernel mailing list