[PATCH] KVM: arm64: Flush hyp bss section after initialization of variables in bss
Marc Zyngier
maz at kernel.org
Mon Jan 20 08:13:56 PST 2025
On Mon, 20 Jan 2025 15:15:14 +0000,
Lokesh Vutla <lokeshvutla at google.com> wrote:
>
> To determine CPU features during initialization, the NVHE hypervisor
s/NVHE/nVHE/
> utilizes sanitized values of the host's CPU features registers. These
> values, stored in u64 idaa64*_el1_sys_val variables are updated by the
> kvm_hyp_init_symbols() function at EL1. To ensure EL2 visibility, the
visibility *with the MMU off*
> data cache needs to be flushed after these updates. However,
> individually flushing each variable using kvm_flush_dcache_to_poc() is
> inefficient.
>
> These cpu feature variables would be part of the bss section of
> the hypervisor. Hence, flush the entire bss section of hypervisor
> once the initialization is complete.
>
> Motivation for this change:
> * Since the existing variables are not flushed from EL1, the
> id_aa64pfr0_el1_sys_val is seen as 0 from EL2.
> * based on this value check_override macro in hypervisor skips
> updating the sve (cpacr_el1) at finalise_el2_state.
> * The default value for cpacr_el1 enables the sve traps to EL2.
> * With sve enabled, during the context switch from EL0 -> EL1 (which is
> much later in the boot process), the sve registers are saved/restored.
> * Since sve traps are enabled, accessing sve registers at EL1 caused a
> trap to EL2.
> * However, hypervisor is not ready to handle sve traps at this stage
> causing the below kernel crash during the boot:
Drop this section, it doesn't bring much to the discussion.
>
> [ 0.320850][ T1] Run /init as init process
> [ 0.321392][ T1] kvm [1]: nVHE hyp BUG at: [<ffffffc08112ee8c>] __kvm_nvhe_$x.24+0x254/0x254!
> [ 0.321522][ T1] kvm [1]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE
> [ 0.321635][ T1] kvm [1]: Hyp Offset: 0xffffff6e60000000
> [ 0.321710][ T1] Kernel panic - not syncing: HYP panic:
> [ 0.321710][ T1] PS:634023c9 PC:000000522112ee8c ESR:00000000f2000800
> [ 0.321710][ T1] FAR:0000000000000000 CPACR:0000000000310000 PAR:0000000000000800
> [ 0.321710][ T1] VCPU:0000000000000000
> [...]
> [ 0.322251][ T1] Call trace:
> [ 0.322292][ T1] dump_backtrace+0x100/0x180
> [ 0.322355][ T1] show_stack+0x20/0x30
> [ 0.322410][ T1] dump_stack_lvl+0x40/0x88
> [ 0.322471][ T1] dump_stack+0x18/0x24
> [ 0.322523][ T1] panic+0x13c/0x364
> [ 0.322578][ T1] nvhe_hyp_panic_handler+0x148/0x1cc
> [ 0.322646][ T1] do_sve_acc+0xec/0x260
> [ 0.322706][ T1] el0_sve_acc+0x34/0x68
This is essentially content-free, given that there is no
backtrace. Please drop this.
>
> Fixes: 6c30bfb18d0b ("KVM: arm64: Add handlers for protected VM System Registers")
> Suggested-by: Fuad Tabba <tabba at google.com>
> Signed-off-by: Lokesh Vutla <lokeshvutla at google.com>
> ---
> arch/arm64/kvm/arm.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a102c3aebdbc..5d3b2069a2d5 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2661,6 +2661,12 @@ static int __init init_hyp_mode(void)
> }
> }
>
> + /*
> + * Flush entire BSS since part of its data is read while the MMU is off.
> + */
> + kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start),
> + kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start));
> +
> return 0;
>
> out_err:
I don't understand how this fixes anything. At this stage, the
hypervisor has already been initialised, and I expect it will have
evaluated the wrong values.
Even worse, I strongly suspect that by the time you perform this, S2
is enabled on the host, and that the BSS is off-limit. Which means it
could fault and send you to lalaland.
Have you actually tested this with upstream?
I would have expected the clean operations to be called from
kvm_hyp_init_symbols(), which runs before EL2 gets initialised in
protected mode.
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list