[PATCH 1/2] x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline
Vitaly Kuznetsov
vkuznets at redhat.com
Mon May 22 01:56:05 PDT 2023
Michael Kelley <mikelley at microsoft.com> writes:
> These commits
>
> a494aef23dfc ("PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg")
> 2c6ba4216844 ("PCI: hv: Enable PCI pass-thru devices in Confidential VMs")
>
> update the Hyper-V virtual PCI driver to use the hyperv_pcpu_input_arg
> because that memory will be correctly marked as decrypted or encrypted
> for all VM types (CoCo or normal). But problems ensue when CPUs in the
> VM go online or offline after virtual PCI devices have been configured.
>
> When a CPU is brought online, the hyperv_pcpu_input_arg for that CPU is
> initialized by hv_cpu_init() running under state CPUHP_AP_ONLINE_DYN.
> But this state occurs after state CPUHP_AP_IRQ_AFFINITY_ONLINE, which
> may call the virtual PCI driver and fault trying to use the as yet
> uninitialized hyperv_pcpu_input_arg. A similar problem occurs in a CoCo
> VM if the MMIO read and write hypercalls are used from state
> CPUHP_AP_IRQ_AFFINITY_ONLINE.
>
> When a CPU is taken offline, IRQs may be reassigned in state
> CPUHP_TEARDOWN_CPU. Again, the virtual PCI driver may fault trying to
> use the hyperv_pcpu_input_arg that has already been freed by a
> higher state.
>
> Fix the onlining problem by adding state CPUHP_AP_HYPERV_ONLINE
> immediately after CPUHP_AP_ONLINE_IDLE (similar to CPUHP_AP_KVM_ONLINE)
> and before CPUHP_AP_IRQ_AFFINITY_ONLINE. Use this new state for
> Hyper-V initialization so that hyperv_pcpu_input_arg is allocated
> early enough.
>
> Fix the offlining problem by not freeing hyperv_pcpu_input_arg when
> a CPU goes offline. Retain the allocated memory, and reuse it if
> the CPU comes back online later.
>
> Signed-off-by: Michael Kelley <mikelley at microsoft.com>
> ---
> arch/x86/hyperv/hv_init.c | 2 +-
> drivers/hv/hv_common.c | 48 +++++++++++++++++++++++-----------------------
> include/linux/cpuhotplug.h | 1 +
> 3 files changed, 26 insertions(+), 25 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index a5f9474..6c04b52 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -416,7 +416,7 @@ void __init hyperv_init(void)
> goto free_vp_assist_page;
> }
>
> - cpuhp = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
> + cpuhp = cpuhp_setup_state(CPUHP_AP_HYPERV_ONLINE, "x86/hyperv_init:online",
> hv_cpu_init, hv_cpu_die);
> if (cpuhp < 0)
> goto free_ghcb_page;
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 64f9cec..4c5cee4 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -364,13 +364,20 @@ int hv_common_cpu_init(unsigned int cpu)
> flags = irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL;
>
> inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> - if (!(*inputarg))
> - return -ENOMEM;
>
> - if (hv_root_partition) {
> - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> - *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> + /*
> + * hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory is already
> + * allocated if this CPU was previously online and then taken offline
> + */
> + if (!*inputarg) {
> + *inputarg = kmalloc(pgcount * HV_HYP_PAGE_SIZE, flags);
> + if (!(*inputarg))
> + return -ENOMEM;
> +
> + if (hv_root_partition) {
> + outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *outputarg = (char *)(*inputarg) + HV_HYP_PAGE_SIZE;
> + }
> }
>
> msr_vp_index = hv_get_register(HV_REGISTER_VP_INDEX);
> @@ -385,24 +392,17 @@ int hv_common_cpu_init(unsigned int cpu)
>
> int hv_common_cpu_die(unsigned int cpu)
> {
> - unsigned long flags;
> - void **inputarg, **outputarg;
> - void *mem;
> -
> - local_irq_save(flags);
> -
> - inputarg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - mem = *inputarg;
> - *inputarg = NULL;
> -
> - if (hv_root_partition) {
> - outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> - *outputarg = NULL;
> - }
> -
> - local_irq_restore(flags);
> -
> - kfree(mem);
> + /*
> + * The hyperv_pcpu_input_arg and hyperv_pcpu_output_arg memory
> + * is not freed when the CPU goes offline as the hyperv_pcpu_input_arg
> + * may be used by the Hyper-V vPCI driver in reassigning interrupts
> + * as part of the offlining process. The interrupt reassignment
> + * happens *after* the CPUHP_AP_HYPERV_ONLINE state has run and
> + * called this function.
> + *
> + * If a previously offlined CPU is brought back online again, the
> + * originally allocated memory is reused in hv_common_cpu_init().
> + */
>
> return 0;
> }
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index 0f1001d..696004a 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -201,6 +201,7 @@ enum cpuhp_state {
> /* Online section invoked on the hotplugged CPU from the hotplug thread */
> CPUHP_AP_ONLINE_IDLE,
> CPUHP_AP_KVM_ONLINE,
> + CPUHP_AP_HYPERV_ONLINE,
(Cc: KVM)
I would very much prefer we swap the order with KVM here: hv_cpu_init()
allocates and sets vCPU's VP assist page which is used by KVM on
CPUHP_AP_KVM_ONLINE:
kvm_online_cpu() -> __hardware_enable_nolock() ->
kvm_arch_hardware_enable() -> vmx_hardware_enable():
/*
* This can happen if we hot-added a CPU but failed to allocate
* VP assist page for it.
*/
if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu))
return -EFAULT;
With the change, this is likely broken.
FWIF, KVM also needs current vCPU's VP index (also set by hv_cpu_init())
through __kvm_x86_vendor_init() -> set_hv_tscchange_cb() call chain but
this happens upon KVM module load so CPU hotplug ordering should not
matter as this always happens on a CPU which is already online.
Generally, as 'KVM on Hyper-V' is a supported scenario, doing Hyper-V
init before KVM's (and vice versa on teardown) makes sense.
> CPUHP_AP_SCHED_WAIT_EMPTY,
> CPUHP_AP_SMPBOOT_THREADS,
> CPUHP_AP_X86_VDSO_VMA_ONLINE,
--
Vitaly
More information about the linux-arm-kernel
mailing list