[PATCH] KVM: arm64: Flush hyp bss section after initialization of variables in bss

Marc Zyngier maz at kernel.org
Mon Jan 20 08:13:56 PST 2025


On Mon, 20 Jan 2025 15:15:14 +0000,
Lokesh Vutla <lokeshvutla at google.com> wrote:
> 
> To determine CPU features during initialization, the NVHE hypervisor

s/NVHE/nVHE/

> utilizes sanitized values of the host's CPU features registers. These
> values, stored in u64 idaa64*_el1_sys_val variables are updated by the
> kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility, the

visibility *with the MMU off*

> data cache needs to be flushed after these updates. However,
> individually flushing each variable using kvm_flush_dcache_to_poc() is
> inefficient.
> 
> These cpu feature variables would be part of the bss section of
> the hypervisor. Hence, flush the entire bss section of hypervisor
> once the initialization is complete.
> 
> Motivation for this change:
> * Since the existing variables are not flushed from EL1, the
>   id_aa64pfr0_el1_sys_val is seen as 0 from EL2.
> * based on this value check_override macro in hypervisor skips
>   updating the sve (cpacr_el1) at finalise_el2_state.
> * The default value for cpacr_el1 enables the sve traps to EL2.
> * With sve enabled, during the context switch from EL0 -> EL1 (which is
>   much later in the boot process), the sve registers are saved/restored.
> * Since sve traps are enabled, accessing sve registers at EL1 caused a
>   trap to EL2.
> * However, hypervisor is not ready to handle sve traps at this stage
>   causing the below kernel crash during the boot:

Drop this section, it doesn't bring much to the discussion.

> 
> [    0.320850][    T1] Run /init as init process
> [    0.321392][    T1] kvm [1]: nVHE hyp BUG at: [<ffffffc08112ee8c>] __kvm_nvhe_$x.24+0x254/0x254!
> [    0.321522][    T1] kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE
> [    0.321635][    T1] kvm [1]: Hyp Offset: 0xffffff6e60000000
> [    0.321710][    T1] Kernel panic - not syncing: HYP panic:
> [    0.321710][    T1] PS:634023c9 PC:000000522112ee8c ESR:00000000f2000800
> [    0.321710][    T1] FAR:0000000000000000 CPACR:0000000000310000 PAR:0000000000000800
> [    0.321710][    T1] VCPU:0000000000000000
> [...]
> [    0.322251][    T1] Call trace:
> [    0.322292][    T1]  dump_backtrace+0x100/0x180
> [    0.322355][    T1]  show_stack+0x20/0x30
> [    0.322410][    T1]  dump_stack_lvl+0x40/0x88
> [    0.322471][    T1]  dump_stack+0x18/0x24
> [    0.322523][    T1]  panic+0x13c/0x364
> [    0.322578][    T1]  nvhe_hyp_panic_handler+0x148/0x1cc
> [    0.322646][    T1]  do_sve_acc+0xec/0x260
> [    0.322706][    T1]  el0_sve_acc+0x34/0x68

This is essentially content-free, given that there is no
backtrace. Please drop this.

>
> Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers")
> Suggested-by: Fuad Tabba <tabba at google.com>
> Signed-off-by: Lokesh Vutla <lokeshvutla at google.com>
> ---
>  arch/arm64/kvm/arm.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a102c3aebdbc..5d3b2069a2d5 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2661,6 +2661,12 @@ static int __init init_hyp_mode(void)
>  		}
>  	}
>  
> +	/*
> +	 * Flush entire BSS since part of its data is read while the MMU is off.
> +	 */
> +	kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start),
> +				kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start));
> +
>  	return 0;
>  
>  out_err:

I don't understand how this fixes anything. At this stage, the
hypervisor has already been initialised, and I expect it will have
evaluated the wrong values.

Even worse, I strongly suspect that by the time you perform this, S2
is enabled on the host, and that the BSS is off-limit. Which means it
could fault and send you to lalaland.

Have you actually tested this with upstream?

I would have expected the clean operations to be called from
kvm_hyp_init_symbols(), which runs before EL2 gets initialised in
protected mode.

	M.

-- 
Without deviation from the norm, progress is not possible.



More information about the linux-arm-kernel mailing list