[RFC PATCH 1/5] ARM/ARM64: KVM: Update user space API header for PSCI emulation

Anup Patel anup at brainfault.org
Thu Oct 17 11:32:53 EDT 2013


On Thu, Oct 17, 2013 at 5:25 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
> On 17/10/13 12:49, Alexander Graf wrote:
>>
>> On 17.10.2013, at 13:30, Anup Patel <anup at brainfault.org> wrote:
>>
>>> On Thu, Oct 17, 2013 at 4:51 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>>> On 17/10/13 12:10, Anup Patel wrote:
>>>>> On Thu, Oct 17, 2013 at 2:17 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>>>>> On 17/10/13 07:45, Anup Patel wrote:
>>>>>>> On Thu, Oct 17, 2013 at 3:41 AM, Christoffer Dall
>>>>>>> <christoffer.dall at linaro.org> wrote:
>>>>>>>> On Wed, Oct 16, 2013 at 10:32:30PM +0530, Anup Patel wrote:
>>>>>>>>> Update user space API interface headers for providing information to
>>>>>>>>> user space needed to emulate PSCI function calls in user space (i.e.
>>>>>>>>> QEMU or KVMTOOL).
>>>>>>>>>
>>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar at linaro.org>
>>>>>>>>> ---
>>>>>>>>> include/uapi/linux/kvm.h |    7 +++++++
>>>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>>>>>> index e32e776..dae2664 100644
>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>> @@ -171,6 +171,7 @@ struct kvm_pit_config {
>>>>>>>>> #define KVM_EXIT_WATCHDOG         21
>>>>>>>>> #define KVM_EXIT_S390_TSCH        22
>>>>>>>>> #define KVM_EXIT_EPR              23
>>>>>>>>> +#define KVM_EXIT_PSCI             24
>>>>>>>>>
>>>>>>>>> /* For KVM_EXIT_INTERNAL_ERROR */
>>>>>>>>> /* Emulate instruction failed. */
>>>>>>>>> @@ -301,6 +302,12 @@ struct kvm_run {
>>>>>>>>>              struct {
>>>>>>>>>                      __u32 epr;
>>>>>>>>>              } epr;
>>>>>>>>> +             /* KVM_EXIT_PSCI */
>>>>>>>>> +             struct {
>>>>>>>>> +                     __u32 fn;
>>>>>>>>> +                     __u64 args[7];
>>>>>>>>> +                     __u64 ret[4];
>>>>>>>>> +             } psci;
>>>>>>>>>              /* Fix the size of the union. */
>>>>>>>>>              char padding[256];
>>>>>>>>>      };
>>>>>>>>> --
>>>>>>>>> 1.7.9.5
>>>>>>>>>
>>>>>>>> I am also wondering if this is not solving a very specific need without
>>>>>>>> thinking a little more carefully about this problem.
>>>>>>>
>>>>>>> No, its not solving a specific problem.
>>>>>>>
>>>>>>> In fact, its more general because we pass complete info required to
>>>>>>> emulate a PSCI call in user space.
>>>>>>> (Please refer PSCI calling convention)
>>>>>>>
>>>>>>>>
>>>>>>>> We have previously discussed the need for some secure side emulation
>>>>>>>> in QEMU, and I think perhaps we need something more generic which allows
>>>>>>>> user space to handle SMC calls and/or allows user space to "inject" some
>>>>>>>> secure world runtime that the kernel can run in a partially or fully
>>>>>>>> isolated container to handle SMC calls.
>>>>>>>>
>>>>>>>> Peter raised this issue previously and pointed to a proposal he had as
>>>>>>>> well.
>>>>>>>
>>>>>>> If required we can have an additional field in kvm_run->psci which tells
>>>>>>> whether the PSCI call is an SMC call or HVC call.
>>>>>>>
>>>>>>>>
>>>>>>>> Is there a technical reason why we need something specifically directed
>>>>>>>> to PSCI?
>>>>>>>
>>>>>>> Its quite natural to add this to PSCI emulation in KVM ARM/ARM64 instead
>>>>>>> of adding a separate VirtIO device for System reboot and System poweroff.
>>>>>>>
>>>>>>> Also in the process of implementing SYSTEM_OFF and SYSTEM_RESET
>>>>>>> emulation in user space we would also have an infrastructure for adding
>>>>>>> emulation of new PSCI calls in user space.
>>>>>>
>>>>>> And I strongly oppose to that. It creates consistency issues (what if
>>>>>> userspace implements one version of PSCI, and the kernel another?), and
>>>>>> also some really horrible situations: Imagine you implement the SUSPEND
>>>>>> operation in userspace, and want to wake the vcpu up with an interrupt.
>>>>>> You'd end-up having to keep track of the state in the kernel, having to
>>>>>> forward the interrupt event to userspace...
>>>>>
>>>>> It is not about emulating all PSCI functions in user space. Its about forwarding
>>>>> system-level PSCI functions or PSCI functions which cannot be emulated in
>>>>> kernel to user space.
>>>>
>>>> The CPU parts of PSCI can perfectly be implemented in the kernel.
>>>
>>> Agreed. This patches does the same.
>>>
>>>>
>>>> Then you can return something to userspace indicating what just
>>>> happened. And it doesn't have to be PSCI specific.
>>>
>>> Are you suggesting that everytime we want to emulate some new
>>> PSCI call with help from user space (e.g. SYSTEM_OFF and
>>> SYSTEM_RESET), we add new exit reasons and just keep on
>>> increasing KVM exit reasons ?
>>>
>>> Why can't the exit reason and exit info in struct kvm_run be
>>> PSCI specific ?
>>>
>>> On the contrary, it will be good to have exit reason and exit info
>>> PSCI specific because we have PSCI specification which tells
>>> how it is to be emulated ?
>>
>> I completely agree with Marc that split-brain ownership of any address space (and PSCI is basically one) is a very bad idea.
>>
>> However, so far the only solution I've seen mentioned is that the kernel owns PSCI (read: decodes it) and then drives user space with explicit commands.
>>
>> Couldn't we reverse this logic? User space owns PSCI. By default all PSCI calls go to user space. If a PSCI call makes more sense to be executed by kvm, it can explicitly route it to be handled by kvm instead.
>>
>> That way the owner is still at a single spot and we can fast path the few cases that may be performance critical or a lot easier to handle in kvm.
>>
>> The good part about this is that we get consistency in QEMU with the TCG PSCI handlers along the way.
>
> The only nag here is that you can't do that for every function: SUSPEND
> is one, for example. Once your vcpu is suspended, you need to to wake it
> up with an interrupt, which are not routed to userspace (TFFT!).

SUSPEND, OFF, and ON are VCPU level PSCI functions hence have to be
done in kernel KVM code.

SYSTEM_OFF and SYSTEM_RESET on the other had are board-level
PSCI functions hence have to be done in user space.

--
Anup

>
> So it becomes yet another can of worms, and I rather keep it simple.
>
>         M.
> --
> Jazz is not dead. It just smells funny...
>



More information about the linux-arm-kernel mailing list