[PATCH v12 04/16] arm64: kvm: allows kvm cpu hotplug
Marc Zyngier
marc.zyngier at arm.com
Mon Dec 14 09:33:24 PST 2015
On 14/12/15 07:33, AKASHI Takahiro wrote:
> Marc,
>
> On 12/12/2015 01:28 AM, Marc Zyngier wrote:
>> On 11/12/15 08:06, AKASHI Takahiro wrote:
>>> Ashwin, Marc,
>>>
>>> On 12/03/2015 10:58 PM, Marc Zyngier wrote:
>>>> On 02/12/15 22:40, Ashwin Chaugule wrote:
>>>>> Hello,
>>>>>
>>>>> On 24 November 2015 at 17:25, Geoff Levand <geoff at infradead.org> wrote:
>>>>>> From: AKASHI Takahiro <takahiro.akashi at linaro.org>
>>>>>>
>>>>>> The current kvm implementation on arm64 does cpu-specific initialization
>>>>>> at system boot, and has no way to gracefully shutdown a core in terms of
>>>>>> kvm. This prevents, especially, kexec from rebooting the system on a boot
>>>>>> core in EL2.
>>>>>>
>>>>>> This patch adds a cpu tear-down function and also puts an existing cpu-init
>>>>>> code into a separate function, kvm_arch_hardware_disable() and
>>>>>> kvm_arch_hardware_enable() respectively.
>>>>>> We don't need arm64-specific cpu hotplug hook any more.
>>>>>>
>>>>>> Since this patch modifies common part of code between arm and arm64, one
>>>>>> stub definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
>>>>>> compiling errors.
>>>>>>
>>>>>> Signed-off-by: AKASHI Takahiro <takahiro.akashi at linaro.org>
>>>>>> ---
>>>>>> arch/arm/include/asm/kvm_host.h | 10 ++++-
>>>>>> arch/arm/include/asm/kvm_mmu.h | 1 +
>>>>>> arch/arm/kvm/arm.c | 79 ++++++++++++++++++---------------------
>>>>>> arch/arm/kvm/mmu.c | 5 +++
>>>>>> arch/arm64/include/asm/kvm_host.h | 16 +++++++-
>>>>>> arch/arm64/include/asm/kvm_mmu.h | 1 +
>>>>>> arch/arm64/include/asm/virt.h | 9 +++++
>>>>>> arch/arm64/kvm/hyp-init.S | 33 ++++++++++++++++
>>>>>> arch/arm64/kvm/hyp.S | 32 ++++++++++++++--
>>>>>> 9 files changed, 138 insertions(+), 48 deletions(-)
>>>>>
>>>>> [..]
>>>>>
>>>>>>
>>>>>>
>>>>>> static struct notifier_block hyp_init_cpu_pm_nb = {
>>>>>> @@ -1108,11 +1119,6 @@ static int init_hyp_mode(void)
>>>>>> }
>>>>>>
>>>>>> /*
>>>>>> - * Execute the init code on each CPU.
>>>>>> - */
>>>>>> - on_each_cpu(cpu_init_hyp_mode, NULL, 1);
>>>>>> -
>>>>>> - /*
>>>>>> * Init HYP view of VGIC
>>>>>> */
>>>>>> err = kvm_vgic_hyp_init();
>>>>>
>>>>> With this flow, the cpu_init_hyp_mode() is called only at VM guest
>>>>> creation, but vgic_hyp_init() is called at bootup. On a system with
>>>>> GICv3, it looks like we end up with bogus values from the ICH_VTR_EL2
>>>>> (to get the number of LRs), because we're not reading it from EL2
>>>>> anymore.
>>>
>>> Thank you for pointing this out.
>>> Recently I tested my kdump code on hikey, and as hikey(hi6220) has gic-400,
>>> I didn't notice this problem.
>>
>> Because GIC-400 is a GICv2 implementation, which is entirely MMIO based.
>> GICv3 uses some system registers that are only available at EL2, and KVM
>> needs some information contained in these registers before being able to
>> get initialized.
>
> I see.
>
>>>> Indeed, this is completely broken (I just reproduced the issue on a
>>>> model). I wish this kind of details had been checked earlier, but thanks
>>>> for pointing it out.
>>>>
>>>>> Whats the best way to fix this?
>>>>> - Call kvm_arch_hardware_enable() before vgic_hyp_init() and disable later?
>>>>> - Fold the VGIC init stuff back into hardware_enable()?
>>>>
>>>> None of that works - kvm_arch_hardware_enable() is called once per CPU,
>>>> while vgic_hyp_init() can only be called once. Also,
>>>> kvm_arch_hardware_enable() is called from interrupt context, and I
>>>> wouldn't feel comfortable starting probing DT and allocating stuff from
>>>> there.
>>>
>>> Do you think so?
>>> How about the fixup! patch attached below?
>>> The point is that, like Ashwin's first idea, we initialize cpus temporarily
>>> before kvm_vgic_hyp_init() and then soon reset cpus again. Thus,
>>> kvm cpu hotplug will still continue to work as before.
>>> Now that cpu_init_hyp_mode() is revived as exactly the same as Marc's
>>> original code, the change will not be a big jump.
>>
>> This seems quite complicated:
>> - init EL2 on all CPUs
>> - do some initialization
>> - tear all CPUs EL2 down
>> - let KVM drive the vectors being set or not
>>
>> My questions are: why do we need to do this on *all* cpus? Can't that
>> work on a single one?
>
> I did initialize all the cpus partly because using preempt_enable/disable
> looked a bit ugly and partly because we may, in the future, do additional
> per-cpu initialization in kvm_vgic_hyp_init() and/or kvm_timer_hyp_init().
> But if you're comfortable with preempt_*() stuff, I don' care.
>
>
>> Also, the simple fact that we were able to get some junk value is a sign
>> that something is amiss. I'd expect a splat of some sort, because we now
>> have a possibility of doing things in the wrong context.
>>
>>>
>>> If kvm_hyp_call() in vgic_v3_probe()/kvm_vgic_hyp_init() is a *problem*,
>>> I hope this should work. Actually I confirmed that, with this fixup! patch,
>>> we could run a kvm guest and also successfully executed kexec on model w/gic-v3.
>>>
>>> My only concern is the following kernel message I saw when kexec shut down
>>> the kernel:
>>> (Please note that I was running one kvm quest (pid=961) here.)
>>>
>>> ===
>>> sh-4.3# ./kexec -d -e
>>> kexec version: 15.11.16.11.06-g41e52e2
>>> arch_process_options:112: command_line: (null)
>>> arch_process_options:114: initrd: (null)
>>> arch_process_options:115: dtb: (null)
>>> arch_process_options:117: port: 0x0
>>> kvm: exiting hardware virtualization
>>> kvm [961]: Unsupported exception type: 6248304 <== this message
>>
>> That makes me feel very uncomfortable. It looks like we've exited a
>> guest with some horrible value in X0. How is that even possible?
>>
>> This deserves to be investigated.
>
> I guess the problem is that cpu tear-down function is called even if a kvm guest
> is still running in kvm_arch_vcpu_ioctl_run().
> So adding a check whether cpu has been initialized or not in every iteration of
> kvm_arch_vcpu_ioctl_run() will, if necessary, terminate a guest safely without entering
> a guest mode. Since this check is done while interrupt is disabled, it won't
> interfere with kvm_arch_hardware_disable() called via IPI.
> See the attached fixup patch.
>
> Again, I verified the code on model.
>
> Thanks,
> -Takahiro AKASHI
>
>> Thanks,
>>
>> M.
>>
>
> ----8<----
> From 77f273ba5e0c3dfcf75a5a8d1da8035cc390250c Mon Sep 17 00:00:00 2001
> From: AKASHI Takahiro <takahiro.akashi at linaro.org>
> Date: Fri, 11 Dec 2015 13:43:35 +0900
> Subject: [PATCH] fixup! arm64: kvm: allows kvm cpu hotplug
>
> ---
> arch/arm/kvm/arm.c | 45 ++++++++++++++++++++++++++++++++++-----------
> 1 file changed, 34 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 518c3c7..d7e86fb 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -573,7 +573,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> /*
> * Re-check atomic conditions
> */
> - if (signal_pending(current)) {
> + if (__hyp_get_vectors() == hyp_default_vectors) {
> + /* cpu has been torn down */
> + ret = -ENOEXEC;
> + run->exit_reason = KVM_EXIT_SHUTDOWN;
That feels completely overkill (and very slow). Why don't you maintain a
per-cpu variable containing the CPU states, which will avoid calling
__hyp_get_vectors() all the time? You should be able to reuse that
construct everywhere.
Also, I'm not sure about KVM_EXIT_SHUTDOWN. This looks very x86 specific
(called on triple fault). KVM_EXIT_FAIL_ENTRY looks more appropriate,
and the hardware_entry_failure_reason field should be populated (and
documented).
Thanks,
M.
--
Jazz is not dead. It just smells funny...
More information about the kexec
mailing list