[Question] How to testing SDEI client driver

Gavin Shan gshan at redhat.com
Fri Jul 10 05:08:28 EDT 2020


Hi James,

On 7/10/20 4:31 AM, James Morse wrote:
> On 09/07/2020 06:33, Gavin Shan wrote:
>> On 7/9/20 2:49 AM, Paolo Bonzini wrote:
>>> On 08/07/20 18:11, James Morse wrote:
>>>> On 03/07/2020 01:26, Gavin Shan wrote:
> 
>>>>> For the SDEI
>>>>> events needed by the async page fault, it's originated from KVM (host). In order
>>>>> to achieve the goal, KVM needs some code so that SDEI event can be injected and
>>>>> delivered. Also, the SDEI related hypercalls needs to be handled either.
>>>>
>>>> I avoided doing this because it makes it massively complicated for the VMM. All that
>>>> in-kernel state now has to be migrated. KVM has to expose APIs to let the VMM inject
>>>> events, which gets nasty for shared events where some CPUs are masked, and others aren't.
>>>>
>>>> Having something like Qemu drive the reference code from TFA is the right thing to do for
>>>> SDEI.
>>>
>>> Are there usecases for injecting SDEIs from QEMU?
>>>
>>> If not, it can be done much more easily with KVM (and it would also
>>> would be really, really slow if each page fault had to be redirected
>>> through QEMU), which wouldn't have more than a handful of SDEI events.
>>> The in-kernel state is 4 64-bit values (EP address and argument, flags,
>>> affinity) per event.
> 
>> I don't think there is existing usercase to inject SDEIs from qemu.
> 
> use-case or user-space?
> 
> There was a series to add support for emulating firmware-first RAS. I think it got stuck
> in the wider problem of how Qemu can consume reference code from TFA (the EL3 firmware) to
> reduce the maintenance overhead. (every time Arm add something else up there, Qemu would
> need to emulate it. It should be possible to consume the TFA reference code)
> 

I'm not sure if the patchset has been ever posted. If so, could you
please tell the link to that? I might take a look when getting a
chance.

>> However, there is one ioctl command is reserved for this purpose
>> in my code, so that QEMU can inject SDEI event if needed.
>>
>> Yes, the implementation of my code is done in kvm to inject SDEI
>> event directly, on request received from the consumer like APF.
> 
>> By the way, I just finished splitting the code into RFC patches.
>> Please let me I should post it to provide more details, or it
>> should be deferred until this discussion is finished.
> 
> I need to go through the SDEI patches you posted yet. If you post a link to the branch I
> can have a look to get a better idea of the shape of this thing...
> 
> (I've not gone looking for the x86 code yet)
> 

Sure. Here is the link to the git repo:

https://github.com/gwshan/linux.git

branch ("sdei_client"): the sdei client driver rework series I posted.
branch ("sdei"): the patches to make SDEI virtualized, which bases on "sdei_client".

>>>>> Yes, The SDEI specification already mentioned
>>>>> this: the client handler should have all required resources in place before
>>>>> the handler is going to run. However, I don't see it's a problem so far.
>>>>
>>>> What if they are swapped out? This thing becomes re-entrant ... which the spec forbids.
>>>> The host has no clue what is in guest memory.
>>>
>>> On x86 we don't do the notification if interrupts are disabled.  On ARM
>>> I guess you'd do the same until SDEI_EVENT_COMPLETE (so yeah that would
>>> be some state that has to be migrated).  In fact it would be nice if
>>> SDEI_EVENT_COMPLETE meant "wait for synchronous page-in" while
>>> SDEI_EVENT_COMPLETE_AND_RESUME meant "handle it asynchronously".
> 
>> I'm not sure I understand this issue completely. When the vCPU is preempted,
>> all registers should have been saved to vcpu->arch.ctxt. The SDEI context is
>> saved to vcpu->arch.ctxt either. They will be restored when the vCPU gets
>> running afterwards. From the syntax perspective, it's not broken.
>>
>> Yes, I plan to use private event, which is only visible to kvm and guest.
>> Also, it has critical priority. The new SDEI event can't be delivered until
>> the previous critical event is finished.
>>
>> Paolo, it's intresting idea to reuse SDEI_EVENT_COMPLETE/AND_RESUME. Do you
>> mean to use these two hypercalls to designate PAGE_NOT_READY and PAGE_READY
>> separately? If possible, please provide more details.
> 
> No, I think this suggestion is for the guest to hint back to the hypervisor whether it can
> take this stage2 delay, or it must have the page to make progress.
> 
> SDEI_EVENT_COMPLETE returns to wherever we came from, the arch code will do this if it
> couldn't have taken an IRQ. If it could have taken an IRQ, it uses
> SDEI_EVENT_COMPLETE_AND_RESUME to exit through the interrupt vector.
> 
> This is a trick that gives us two things: KVM guest exit when this is in use on real
> hardware, and the irq-work handler runs to do the work we couldn't do in NMI context, both
> before we return to the context that triggered the fault in the first place.
> Both are needed for the RAS support.
> 

Ok, thanks for the information, which makes thing much more clear.
So SDEI_EVENT_COMPLETE/AND_RESUME is issued depending if current
process can be rescheduled. I think it's Paolo's idea?

> 
> The problem is invoking this whole thing when the guest can't do anything about it,
> because it can't schedule(). You can't know this from outside the guest.
> 

Yes, the interrupted process can't call schedule() before SDEI_EVENT_COMPLETE
at least because the SDEI event handler has to finish as quick as possible.
However, I

              process  ->         SDEI event trigger
                                        |
                                  SDEI event handler is called
                                        |
             schedule() <-        SDEI_EVENT_COMPLETE

As we don't have schedule() in place in advance, we might figure out one
way to insert the schedule() by the SDEI event handler.

Thanks,
Gavin




More information about the linux-arm-kernel mailing list