KVM: arm64: Regression in at least linux-6.1.y tree with recent FPSIMD/SVE/SME fix
Will Deacon
will at kernel.org
Wed Oct 1 02:32:46 PDT 2025
Hi Kenneth,
Thanks for the report.
On Tue, Sep 30, 2025 at 05:31:38PM +0000, Kenneth Van Alstyne wrote:
> Sending via plain text email -- apologies if you receive this twice.
>
> If this isn't the process for reporting a regression in a LTS kernel per
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html,
> I'm happy to follow another process.
>
> Kernel 6.1.149 introduced a regression, at least on our ARM Cortex
> A57-based platforms, via commit 8f4dc4e54eed4bebb18390305eb1f721c00457e1
> in arch/arm64/kernel/fpsimd.c where booting KVM VMs eventually leads to a
> spinlock recursion BUG and crash of the box.
>
> Reverting that commit via the below reverts to the old (working) behavior:
>
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 837d1937300a57..bc42163a7fd1f0 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1851,10 +1851,10 @@ void fpsimd_save_and_flush_cpu_state(void)
> if (!system_supports_fpsimd())
> return;
> WARN_ON(preemptible());
> - get_cpu_fpsimd_context();
> + __get_cpu_fpsimd_context();
> fpsimd_save();
> fpsimd_flush_cpu_state();
> - put_cpu_fpsimd_context();
> + __put_cpu_fpsimd_context();
> }
> #ifdef CONFIG_KERNEL_MODE_NEON
Hmm, the problem with doing that is it will reintroduce the bug that
8f4dc4e54eed ("KVM: arm64: Fix kernel BUG() due to bad backport of
FPSIMD/SVE/SME fix") was trying to fix (see the backtrace in the commit
message). So the old behaviour is still broken, just in a slightly
different way.
> It's not entirely clear to me if this is specific to our firmware,
> specific to ARM Cortex A57, or more systemic as we lack sufficiently
> differentiated hardware to know. I've tested on the latest 6.1 kernel in
> addition to the one in the log below and have also tested a number of
> firmware versions available for these boxes.
>
> Steps to reproduce:
>
> Boot VM in qemu-system-aarch64 with "-accel kvm" and "-cpu host" flags set -- no other arguments seem to matter
> Generate CPU load in VM
>
> Kernel log:
>
> [sjc1] root at si-compute-kvm-e0fff70016b4:/# [ 805.905413] BUG: spinlock recursion on CPU#7, CPU 3/KVM/57616
> [ 805.905452] lock: 0xffff3045ef850240, .magic: dead4ead, .owner: CPU 3/KVM/57616, .owner_cpu: 7
> [ 805.905477] CPU: 7 PID: 57616 Comm: CPU 3/KVM Tainted: G O 6.1.152 #1
> [ 805.905495] Hardware name: SoftIron SoftIron Platform Mainboard/SoftIron Platform Mainboard, BIOS 1.31 May 11 2023
> [ 805.905516] Call trace:
> [ 805.905524] dump_backtrace+0xe4/0x110
> [ 805.905538] show_stack+0x20/0x30
> [ 805.905548] dump_stack_lvl+0x6c/0x88
> [ 805.905561] dump_stack+0x18/0x34
> [ 805.905571] spin_dump+0x98/0xac
> [ 805.905583] do_raw_spin_lock+0x70/0x128
> [ 805.905596] _raw_spin_lock+0x18/0x28
> [ 805.905607] raw_spin_rq_lock_nested+0x18/0x28
> [ 805.905620] update_blocked_averages+0x70/0x550
> [ 805.905634] run_rebalance_domains+0x50/0x70
> [ 805.905645] handle_softirqs+0x198/0x328
> [ 805.905659] __do_softirq+0x1c/0x28
> [ 805.905669] ____do_softirq+0x18/0x28
> [ 805.905680] call_on_irq_stack+0x30/0x48
> [ 805.905691] do_softirq_own_stack+0x24/0x30
> [ 805.905703] do_softirq+0x74/0x90
> [ 805.905714] __local_bh_enable_ip+0x64/0x80
Argh, this is because we can't simply mask/unmask softirqs and so when
they get re-enabled we process anything pending. I _think_ irqs are
disabled at this point, so perhaps we should only bother with
disabling/enabling softirqs if hardirqs are enabled, a bit like the hack
Ard had in:
https://lore.kernel.org/all/20250924152651.3328941-13-ardb+git@google.com/
Hacky diff at the end.
> [ 805.905727] fpsimd_save_and_flush_cpu_state+0x5c/0x68
> [ 805.905740] kvm_arch_vcpu_put_fp+0x4c/0x88
> [ 805.905752] kvm_arch_vcpu_put+0x28/0x88
> [ 805.905764] kvm_sched_out+0x38/0x58
(I think we run context_switch() => prepare_task_switch() here, so irqs
are disabled)
Will
--->8
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index fc51cdd5aaa7..a79df0804d67 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -184,7 +184,8 @@ static void __get_cpu_fpsimd_context(void)
*/
static void get_cpu_fpsimd_context(void)
{
- local_bh_disable();
+ if (!irqs_disabled())
+ local_bh_disable();
__get_cpu_fpsimd_context();
}
@@ -205,7 +206,8 @@ static void __put_cpu_fpsimd_context(void)
static void put_cpu_fpsimd_context(void)
{
__put_cpu_fpsimd_context();
- local_bh_enable();
+ if (!irqs_disabled())
+ local_bh_enable();
}
static bool have_cpu_fpsimd_context(void)
More information about the linux-arm-kernel
mailing list