[RFC PATCH 1/5] ARM/ARM64: KVM: Update user space API header for PSCI emulation

Alexander Graf agraf at suse.de
Thu Oct 17 08:01:18 EDT 2013


On 17.10.2013, at 13:55, Marc Zyngier <marc.zyngier at arm.com> wrote:

> On 17/10/13 12:49, Alexander Graf wrote:
>> 
>> On 17.10.2013, at 13:30, Anup Patel <anup at brainfault.org> wrote:
>> 
>>> On Thu, Oct 17, 2013 at 4:51 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>>> On 17/10/13 12:10, Anup Patel wrote:
>>>>> On Thu, Oct 17, 2013 at 2:17 PM, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>>>>> On 17/10/13 07:45, Anup Patel wrote:
>>>>>>> On Thu, Oct 17, 2013 at 3:41 AM, Christoffer Dall
>>>>>>> <christoffer.dall at linaro.org> wrote:
>>>>>>>> On Wed, Oct 16, 2013 at 10:32:30PM +0530, Anup Patel wrote:
>>>>>>>>> Update user space API interface headers for providing information to
>>>>>>>>> user space needed to emulate PSCI function calls in user space (i.e.
>>>>>>>>> QEMU or KVMTOOL).
>>>>>>>>> 
>>>>>>>>> Signed-off-by: Anup Patel <anup.patel at linaro.org>
>>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar at linaro.org>
>>>>>>>>> ---
>>>>>>>>> include/uapi/linux/kvm.h |    7 +++++++
>>>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>>> 
>>>>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>>>>>> index e32e776..dae2664 100644
>>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>>> @@ -171,6 +171,7 @@ struct kvm_pit_config {
>>>>>>>>> #define KVM_EXIT_WATCHDOG         21
>>>>>>>>> #define KVM_EXIT_S390_TSCH        22
>>>>>>>>> #define KVM_EXIT_EPR              23
>>>>>>>>> +#define KVM_EXIT_PSCI             24
>>>>>>>>> 
>>>>>>>>> /* For KVM_EXIT_INTERNAL_ERROR */
>>>>>>>>> /* Emulate instruction failed. */
>>>>>>>>> @@ -301,6 +302,12 @@ struct kvm_run {
>>>>>>>>>             struct {
>>>>>>>>>                     __u32 epr;
>>>>>>>>>             } epr;
>>>>>>>>> +             /* KVM_EXIT_PSCI */
>>>>>>>>> +             struct {
>>>>>>>>> +                     __u32 fn;
>>>>>>>>> +                     __u64 args[7];
>>>>>>>>> +                     __u64 ret[4];
>>>>>>>>> +             } psci;
>>>>>>>>>             /* Fix the size of the union. */
>>>>>>>>>             char padding[256];
>>>>>>>>>     };
>>>>>>>>> --
>>>>>>>>> 1.7.9.5
>>>>>>>>> 
>>>>>>>> I am also wondering if this is not solving a very specific need without
>>>>>>>> thinking a little more carefully about this problem.
>>>>>>> 
>>>>>>> No, its not solving a specific problem.
>>>>>>> 
>>>>>>> In fact, its more general because we pass complete info required to
>>>>>>> emulate a PSCI call in user space.
>>>>>>> (Please refer PSCI calling convention)
>>>>>>> 
>>>>>>>> 
>>>>>>>> We have previously discussed the need for some secure side emulation
>>>>>>>> in QEMU, and I think perhaps we need something more generic which allows
>>>>>>>> user space to handle SMC calls and/or allows user space to "inject" some
>>>>>>>> secure world runtime that the kernel can run in a partially or fully
>>>>>>>> isolated container to handle SMC calls.
>>>>>>>> 
>>>>>>>> Peter raised this issue previously and pointed to a proposal he had as
>>>>>>>> well.
>>>>>>> 
>>>>>>> If required we can have an additional field in kvm_run->psci which tells
>>>>>>> whether the PSCI call is an SMC call or HVC call.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Is there a technical reason why we need something specifically directed
>>>>>>>> to PSCI?
>>>>>>> 
>>>>>>> Its quite natural to add this to PSCI emulation in KVM ARM/ARM64 instead
>>>>>>> of adding a separate VirtIO device for System reboot and System poweroff.
>>>>>>> 
>>>>>>> Also in the process of implementing SYSTEM_OFF and SYSTEM_RESET
>>>>>>> emulation in user space we would also have an infrastructure for adding
>>>>>>> emulation of new PSCI calls in user space.
>>>>>> 
>>>>>> And I strongly oppose to that. It creates consistency issues (what if
>>>>>> userspace implements one version of PSCI, and the kernel another?), and
>>>>>> also some really horrible situations: Imagine you implement the SUSPEND
>>>>>> operation in userspace, and want to wake the vcpu up with an interrupt.
>>>>>> You'd end-up having to keep track of the state in the kernel, having to
>>>>>> forward the interrupt event to userspace...
>>>>> 
>>>>> It is not about emulating all PSCI functions in user space. Its about forwarding
>>>>> system-level PSCI functions or PSCI functions which cannot be emulated in
>>>>> kernel to user space.
>>>> 
>>>> The CPU parts of PSCI can perfectly be implemented in the kernel.
>>> 
>>> Agreed. This patches does the same.
>>> 
>>>> 
>>>> Then you can return something to userspace indicating what just
>>>> happened. And it doesn't have to be PSCI specific.
>>> 
>>> Are you suggesting that everytime we want to emulate some new
>>> PSCI call with help from user space (e.g. SYSTEM_OFF and
>>> SYSTEM_RESET), we add new exit reasons and just keep on
>>> increasing KVM exit reasons ?
>>> 
>>> Why can't the exit reason and exit info in struct kvm_run be
>>> PSCI specific ?
>>> 
>>> On the contrary, it will be good to have exit reason and exit info
>>> PSCI specific because we have PSCI specification which tells
>>> how it is to be emulated ?
>> 
>> I completely agree with Marc that split-brain ownership of any address space (and PSCI is basically one) is a very bad idea.
>> 
>> However, so far the only solution I've seen mentioned is that the kernel owns PSCI (read: decodes it) and then drives user space with explicit commands.
>> 
>> Couldn't we reverse this logic? User space owns PSCI. By default all PSCI calls go to user space. If a PSCI call makes more sense to be executed by kvm, it can explicitly route it to be handled by kvm instead.
>> 
>> That way the owner is still at a single spot and we can fast path the few cases that may be performance critical or a lot easier to handle in kvm.
>> 
>> The good part about this is that we get consistency in QEMU with the TCG PSCI handlers along the way.
> 
> The only nag here is that you can't do that for every function: SUSPEND
> is one, for example. Once your vcpu is suspended, you need to to wake it
> up with an interrupt, which are not routed to userspace (TFFT!).

Not sure I understand. Can't you just vcpu_kick() it with a posix signal to get it out of vcpu_run() and unset the "suspended" state? If you guarantee that you don't get spurious exits out of SUSPEND you need to be able to set/unset that bit anyways for migration.


Alex

> 
> So it becomes yet another can of worms, and I rather keep it simple.
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...
> 




More information about the linux-arm-kernel mailing list