KVM exit to userspace on WFI

Jan Henrik Weinstock jan at mwa.re
Tue Oct 31 12:21:16 PDT 2023


Am Mo., 30. Okt. 2023 um 13:36 Uhr schrieb Marc Zyngier <maz at kernel.org>:
>
> [please make an effort not to top-post]
>
> On Fri, 27 Oct 2023 18:41:44 +0100,
> Jan Henrik Weinstock <jan at mwa.re> wrote:
> >
> > Hi Marc,
> >
> > the basic idea behind this is to have a (single-threaded) execution loop,
> > something like this:
> >
> > vcpu-thread:    vcpu-run | process-io-devices | vcpu-run | process-io...
> >                          ^
> >                   WFX or timeout
> >
> > We switch to simulating IO devices whenever the vcpu is idle (wfi) or exceeds
> > a certain budget of instructions (counted via pmu). Our fallback currently is
> > to kick the vcpu out of its execution using a signal (via a timeout/alarm). But
> > of course, if the cpu is stuck at a wfi, we are wasting a lot of time.
> >
> > I understand that the proposed behavior is not desirable for most use cases,
> > which is why I suggest locking it behind a flag, e.g.
> > KVM_ARCH_FLAG_WFX_EXIT_TO_USER.
>
> But how do you reconcile the fact that exposing this to userspace
> breaks fundamental expectations that the guest has, such as getting
> its timer interrupts and directly injected LPIs? Implementing WFI in
> userspace breaks it. What about the case where we don't trap WFx and
> let the *guest* wait for an interrupt?

Timer interrupts etc. will be injected into the vcpu during the
io-phases. When there are no interrupts present and the guest performs
a WFI, we can just skip forward to the next timer event.

> Honestly, what you are describing seems to be a use model that doesn't
> fit KVM, which is a general purpose hypervisor, but more a simulation
> environment. Yes, the primitives are the same, but the plumbing is
> wildly different.

Agreed.

> *If* that's the stuff you're looking at, then I'm afraid you'll have
> to do it in different way, because what you are suggesting is
> fundamentally incompatible with the guarantees that KVM gives to guest
> and userspace. Because your KVM_ARCH_FLAG_WFX_EXIT_TO_USER is really a
> lie. It should really be named something more along the lines of
> KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN
> (probably with additional clauses related to breaking things).

I have attached a reworked version of the patch as a reference (based
on my 5.15 kernel). It puts the modified behavior behind a new
capability so as to not interfere with the current expectations
towards handling WFI/WFE.
I think it should now trap all blocking calls to WFx on the vcpu and
reliably return to the userspace. If I have missed something that
would cause the vcpu to not trap on a WFI kindly let me know.

> Overall, you are still asking for something that is not guaranteed at
> the architecture level, even less in KVM, and I'm not going to add
> support for something that can only work "sometime".

I am not quite sure what you mean with "sometime". Are you referring
to WFIs as NOPs? Or WFIs that do not yield because of pending
interrupts?

The point of my patch is not to accurately count every single WFI. The
point is to prevent the host cpu from sleeping just because my vcpu
executed a WFI somewhere in the guest software. If a WFI is executed
by the guest and that does not result in my vcpu thread to block (in
other words: the vcpu continues executing instructions beyond the WFI)
then it also should not exit to userspace. So instead of
"KVM_ARCH_FLAG_WFX_EXIT_TO_USER_SOMETIME_AND_I_DONT_EVEN_KNOW_WHEN" it
is really "KVM_ARCH_FLAG_WFX_EXIT_TO_USER_WHENEVER_YOU_WOULD_OTHERWISE_YIELD_AND_I_CANNOT_GET_MY_THREAD_BACK".

>         M.
>
> --
> Without deviation from the norm, progress is not possible.



-- 
Dr.-Ing. Jan Henrik Weinstock
Managing Director

MachineWare GmbH | www.machineware.de
Hühnermarkt 19, 52062 Aachen, Germany
Amtsgericht Aachen HRB25734

Geschäftsführung
Lukas Jünger
Dr.-Ing. Jan Henrik Weinstock
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kvm.patch
Type: text/x-patch
Size: 3360 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20231031/e30b97ce/attachment-0001.bin>


More information about the linux-arm-kernel mailing list