KVM CPU hotplug notifier triggers BUG_ON on arm64
Kristina Martsenko
kristina.martsenko at arm.com
Mon Jul 3 03:36:51 PDT 2023
On 03/07/2023 10:45, Marc Zyngier wrote:
> On Sat, 01 Jul 2023 18:42:28 +0100,
> Oliver Upton <oliver.upton at linux.dev> wrote:
>>
>> Hi Kristina,
>>
>> Thanks for the bug report.
>>
>> On Sat, Jul 01, 2023 at 01:50:52PM +0100, Kristina Martsenko wrote:
>>> Hi,
>>>
>>> When I try to online a CPU on arm64 while a KVM guest is running, I hit a
>>> BUG_ON(preemptible()) (as well as a WARN_ON). See below for the full log.
>>>
>>> This is on kvmarm/next, but seems to have been broken since 6.3. Bisecting it
>>> points at commit:
>>>
>>> 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
>>
>> Makes sense. We were using a spinlock before, which implictly disables
>> preemption.
>>
>> Well, one way to hack around the problem would be to just cram
>> preempt_{disable,enable}() into kvm_arch_hardware_disable(), but that's
>> kinda gross in the context of cpuhp which isn't migratable in the first
>> place. Let me have a look...
>
> An alternative would be to replace the preemptible() checks with a one
> that looks at the migration state, but I'm not sure that's much better
> (it certainly looks more costly).
>
> There is also the fact that most of our per-CPU accessors are already
> using preemption disabling, and this code has a bunch of them. So I'm
> not sure there is a lot to be gained from not disabling preemption
> upfront.
>
> Anyway, as I was able to reproduce the issue under NV, I tested the
> hack below. If anything, I expect it to be a reasonable fix for
> 6.3/6.4, and until we come up with a better approach.
>
> Thanks,
>
> M.
>
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index aaeae1145359..a28c4ffe4932 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1894,8 +1894,17 @@ static void _kvm_arch_hardware_enable(void *discard)
>
> int kvm_arch_hardware_enable(void)
> {
> - int was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> + int was_enabled;
>
> + /*
> + * Most calls to this function are made with migration
> + * disabled, but not with preemption disabled. The former is
> + * enough to ensure correctness, but most of the helpers
> + * expect the later and will throw a tantrum otherwise.
> + */
> + preempt_disable();
> +
> + was_enabled = __this_cpu_read(kvm_arm_hardware_enabled);
> _kvm_arch_hardware_enable(NULL);
>
> if (!was_enabled) {
> @@ -1903,6 +1912,8 @@ int kvm_arch_hardware_enable(void)
> kvm_timer_cpu_up();
> }
>
> + preempt_enable();
> +
> return 0;
> }
This fixes the issue for me.
Tested-by: Kristina Martsenko <kristina.martsenko at arm.com>
Thanks,
Kristina
More information about the linux-arm-kernel
mailing list