[Question] How to testing SDEI client driver

Paolo Bonzini pbonzini at redhat.com
Wed Jul 8 12:49:50 EDT 2020


On 08/07/20 18:11, James Morse wrote:
> Hi Gavin,
> 
> On 03/07/2020 01:26, Gavin Shan wrote:
>> On 7/1/20 9:57 PM, James Morse wrote:
>>> On 30/06/2020 06:17, Gavin Shan wrote:
>>>> I'm currently looking into SDEI client driver and reworking on it so that
>>>> it can provide capability/services to arm64/kvm to get it virtualized.
>>>
>>> What do you mean by virtualised? The expectation is the VMM would implement the 'firmware'
>>> side of this. 'events' are most likely to come from the VMM, and having to handshake with
>>> the kernel to work out if the event you want to inject is registered and enabled is
>>> over-complicated. Supporting it in the VMM means you can notify a different vCPU if that
>>> is appropriate, or take a different action if the event isn't registered.
>>>
>>> This was all blocked on finding a future-proof way for tools like Qemu to consume
>>> reference code from ATF.
> 
>> Sorry that I didn't mention the story a bit last time. We plan to use SDEI to
>> deliver the notification (signal) from host to guest, needed by the asynchronous
>> page fault feature. The RFCv2 patchset was post a while ago [1].
> 
> Thanks. So this is to hint to the guest that you'd swapped its memory to disk. Yuck.
> 
> When would you do this?

These days, the main reason is on-demand paging with live migration.
Instead of waiting to have a consistent version of guest memory on the
destination, memory that the guest has dirtied can be copied on demand
from source to destination while the guest is running.  Letting the
guest reschedule is surprisingly effective in this case, especially with
workloads that have a lot of threads.

> Isn't this roughly equivalent to SMT CPUs taking a cache-miss? ...
> If you pinned two vCPUs to one physical CPU, the host:scheduler would multiplex between
> them. If one couldn't due useful work because it was waiting for memory, the other gets
> all the slack time. (the TLB maintenance would hurt, but not as much as waiting for the disk)
> The good news is the guest:scheduler already knows how to deal with this!
> (and, it works for other OS too)

The order of magnitude of both the wait and the reschedule is too
different for SMT heuristics to be applicable here.  Especially, two SMT
pCPUs compete equally for fetch resources, while two vCPUs pinned to the
same pCPU would only reschedule a few hundred times per second.  Latency
would be in the milliseconds and jitter would be horribl.

> Wouldn't it be better to let the guest make the swapping decision? 
> You could provide a fast virtio swap device to the guest that is
> backed by maybe-swapped host memory.
I think you are describing something similar to "transcendent memory",
which Xen implemented about 10 years ago
(https://lwn.net/Articles/454795/).  Unfortunately you've probably never
heard about it for good reasons. :)

The main showstopper is that you cannot rely on guest cooperation (also
because it works surprisingly well without).

>> For the SDEI
>> events needed by the async page fault, it's originated from KVM (host). In order
>> to achieve the goal, KVM needs some code so that SDEI event can be injected and
>> delivered. Also, the SDEI related hypercalls needs to be handled either.
> 
> I avoided doing this because it makes it massively complicated for the VMM. All that
> in-kernel state now has to be migrated. KVM has to expose APIs to let the VMM inject
> events, which gets nasty for shared events where some CPUs are masked, and others aren't.
> 
> Having something like Qemu drive the reference code from TFA is the right thing to do for
> SDEI.

Are there usecases for injecting SDEIs from QEMU?

If not, it can be done much more easily with KVM (and it would also
would be really, really slow if each page fault had to be redirected
through QEMU), which wouldn't have more than a handful of SDEI events.
The in-kernel state is 4 64-bit values (EP address and argument, flags,
affinity) per event.

>> Yes, The SDEI specification already mentioned
>> this: the client handler should have all required resources in place before
>> the handler is going to run. However, I don't see it's a problem so far.
>
> What if they are swapped out? This thing becomes re-entrant ... which the spec forbids.
> The host has no clue what is in guest memory.

On x86 we don't do the notification if interrupts are disabled.  On ARM
I guess you'd do the same until SDEI_EVENT_COMPLETE (so yeah that would
be some state that has to be migrated).  In fact it would be nice if
SDEI_EVENT_COMPLETE meant "wait for synchronous page-in" while
SDEI_EVENT_COMPLETE_AND_RESUME meant "handle it asynchronously".

>> Lets wait and see if it's a real issue until I post the RFC patchset :)
> 
> Its not really a try it and see thing!
On this we agree. ;)

Paolo




More information about the linux-arm-kernel mailing list