[Question] How to testing SDEI client driver

Thu Jul 9 01:33:41 EDT 2020

Hi James and Paolo,

On 7/9/20 2:49 AM, Paolo Bonzini wrote:
> On 08/07/20 18:11, James Morse wrote:
>> On 03/07/2020 01:26, Gavin Shan wrote:
>>> On 7/1/20 9:57 PM, James Morse wrote:
>>>> On 30/06/2020 06:17, Gavin Shan wrote:

[...]

>>
>>> Sorry that I didn't mention the story a bit last time. We plan to use SDEI to
>>> deliver the notification (signal) from host to guest, needed by the asynchronous
>>> page fault feature. The RFCv2 patchset was post a while ago [1].
>>
>> Thanks. So this is to hint to the guest that you'd swapped its memory to disk. Yuck.
>>
>> When would you do this?
> 
> These days, the main reason is on-demand paging with live migration.
> Instead of waiting to have a consistent version of guest memory on the
> destination, memory that the guest has dirtied can be copied on demand
> from source to destination while the guest is running.  Letting the
> guest reschedule is surprisingly effective in this case, especially with
> workloads that have a lot of threads.
> 

Paolo, thanks for the explanation.

[...]

>>> For the SDEI
>>> events needed by the async page fault, it's originated from KVM (host). In order
>>> to achieve the goal, KVM needs some code so that SDEI event can be injected and
>>> delivered. Also, the SDEI related hypercalls needs to be handled either.
>>
>> I avoided doing this because it makes it massively complicated for the VMM. All that
>> in-kernel state now has to be migrated. KVM has to expose APIs to let the VMM inject
>> events, which gets nasty for shared events where some CPUs are masked, and others aren't.
>>
>> Having something like Qemu drive the reference code from TFA is the right thing to do for
>> SDEI.
> 
> Are there usecases for injecting SDEIs from QEMU?
> 
> If not, it can be done much more easily with KVM (and it would also
> would be really, really slow if each page fault had to be redirected
> through QEMU), which wouldn't have more than a handful of SDEI events.
> The in-kernel state is 4 64-bit values (EP address and argument, flags,
> affinity) per event.
> 

I don't think there is existing usercase to inject SDEIs from qemu.
However, there is one ioctl command is reserved for this purpose
in my code, so that QEMU can inject SDEI event if needed.

Yes, the implementation of my code is done in kvm to inject SDEI
event directly, on request received from the consumer like APF.

By the way, I just finished splitting the code into RFC patches.
Please let me I should post it to provide more details, or it
should be deferred until this discussion is finished.

>>> Yes, The SDEI specification already mentioned
>>> this: the client handler should have all required resources in place before
>>> the handler is going to run. However, I don't see it's a problem so far.
>>
>> What if they are swapped out? This thing becomes re-entrant ... which the spec forbids.
>> The host has no clue what is in guest memory.
> 
> On x86 we don't do the notification if interrupts are disabled.  On ARM
> I guess you'd do the same until SDEI_EVENT_COMPLETE (so yeah that would
> be some state that has to be migrated).  In fact it would be nice if
> SDEI_EVENT_COMPLETE meant "wait for synchronous page-in" while
> SDEI_EVENT_COMPLETE_AND_RESUME meant "handle it asynchronously".
> 

I'm not sure I understand this issue completely. When the vCPU is preempted,
all registers should have been saved to vcpu->arch.ctxt. The SDEI context is
saved to vcpu->arch.ctxt either. They will be restored when the vCPU gets
running afterwards. From the syntax perspective, it's not broken.

Yes, I plan to use private event, which is only visible to kvm and guest.
Also, it has critical priority. The new SDEI event can't be delivered until
the previous critical event is finished.

Paolo, it's intresting idea to reuse SDEI_EVENT_COMPLETE/AND_RESUME. Do you
mean to use these two hypercalls to designate PAGE_NOT_READY and PAGE_READY
separately? If possible, please provide more details.

[...]

Thanks,
Gavin