[PATCH] kvm: pass the virtual SEI syndrome to guest OS

Wed Mar 22 06:37:23 PDT 2017

Hi James,
  Thank you very much for your detailed comment and answer.

On 2017/3/21 21:10, James Morse wrote:
> Hi,
> 
> On 21/03/17 06:32, gengdongjiu wrote:
>> On 2017/3/20 23:08, James Morse wrote:
>>> On 20/03/17 13:58, Marc Zyngier wrote:
>>>> On 20/03/17 12:28, gengdongjiu wrote:
>>>>> On 2017/3/20 19:24, Marc Zyngier wrote:
>>>>>> On 20/03/17 07:55, Dongjiu Geng wrote:
>>>>>>> In the RAS implementation, hardware pass the virtual SEI
>>>>>>> syndrome information through the VSESR_EL2, so set the virtual
>>>>>>> SEI syndrome using physical SEI syndrome el2_elr to pass to
>>>>>>> the guest OS
> 
> (I've juggled the order of your replies:)
> 
>> so for both SEA and SEI, do you prefer to below steps?
>> EL0/EL1 SEI/SEA ---> EL3 firmware first handle ------> EL2 hypervisor notify >
> the Qemu to inject SEI/SEA------>Qemu call KVM API to inject SEA/SEI---->KVM >
> inject SEA/SEI to guest OS
> 
> Yes, to expand your EL2 hypervisor notify Qemu step:
> 1 The host should call its APEI code to parse the CPER records.
> 2 User space processes are then notified via SIGBUS (or for rasdaemon, trace
>   points).
> 3 Qemu can take the address delivered via SIGBUS and generate CPER records for
>   the guest. It knows how to convert host addresses to guest IPAs, and it knows
>   where in guest memory to write the CPER records.
> 4 Qemu can then notify the guest via whatever mechanism it advertised via the
>   HEST/GHES table. It might not be the same mechanism that the host received
>   the notification through.
> 
> Steps 1 and 2 are the same even if no guest is running, so we don't have to add
> any special case for KVM. This is existing code that x86 uses.
> We can test the Qemu parts without any firmware support and the APEI path in the
> host and guest is the same.
   here do you mean map host APEI table to guest for steps 1 and 2 test? so that the APEI path in the
  host and guest is the same.

> 
> 
>>> Is anyone from Huawei looking at adding RAS support for Qemu?
>>  yes, I am looking at Qemu and want to add RAS support.
> 
> Great, support in Qemu is one of the missing pieces. On x86 it looks like it
> emulates machine-check-exceptions, which is how x86 did this before
> firmware-first and APEI became the standard.
> 
> 
>>  do you mean let Qemu inject both the SEA and SEI?
> 
> To do the notification, yes. It needs to happen after the CPER records have been
> written, and the mechanism and CPER memory location need to match what the guest
> was told via the HEST/GHES table.
> 
> If Qemu didn't tell the guest about firmware-first, it can still deliver the
> guest an SError Interrupt.
> 
> 
> SEA should be possible to do with the KVM_SET_REG API, GPIO/GSIV and the other
> kind of interrupts can use irqfd. For SEI we may need to add an API call to KVM
> to let it pend SError with a specific ESR.
> 
> 
> 
>>> How does this work with firmware first?
> 
>> when the Guest OS triggers an SEI, it will firstly trap to EL3 firmware, El3 firmware records the error
>> info to the APEI table, 
> 
> These are CPER records in a memory area pointed to by one of HEST's GHES entries?
> 
> 
>> then copy the ESR_EL3 ELR_EL3 to ESR_EL2 ELR_EL2 and transfers control to the
>> hypervisor, hypervisor delegates the error exception to EL1 guest
> 
> This is a problem, just because the error occurred while the guest was running
> doesn't mean we should deliver it directly to the guest. Some of these errors
> will be fatal for the CPU and the host should try and power it off to contain
yes, some of error does not need to deliver to guest OS directly. for example if the error is guest kernel fault error,
hypervisor can directly power off the whole guest OS

> the fault. For example: CPER's 'micro-architectural error', should the guest
> power-off the vCPU? All that really does is return to the hypervisor, the error
for this example, I think it is better hypervisor directly close the whole guest OS, instead of
guest power-off the vCPU.

> hasn't been contained.

> 
> Firmware should handle the error first, then the host, finally the guest via Qemu.
> 
> 
>> OS by setting HCR_EL2.VSE to 1 and pass the virtual SEI syndrome through vsesr_el2. 
>> The EL1 guest OS check the DISR_EL1 syndrome information to decide to
>> terminate the application, or do some other recovery action. because the HCR_EL2.AMO is set, so in fact, read
>> DISR_EL1, it returns the VDISR_EL2. and VDISR_EL2 is loaded from VSESR_EL2, so here I pass the virtual SEI
>> syndrome vsesr_el2.
> 
> So this is how an SError Interrupt's ESR gets into a guest. How does it get hold
> of the CPER records?
> 
> 
>>> If we took a Physical SError Interrupt the CPER records are in the hosts memory.
>>> To deliver a RAS event to the guest something needs to generate CPER records and
>>> put them in the guest memory. Only Qemu knows where these memory regions are.
>>>
>>> Put another way, what is the guest expected to do with this SError interrupt?
>>
>> No, we do not only panic,if it is EL0 application SEI. the OS error recovery
>> agent will terminate the EL0 application to isolate the error; If it is EL1 guest
>> OS SError, guest OS can see whether it can recover. if the error was in a read-only file cache buffer, guest OS
>> can invalidate the page and reload the data from disk.
> 
> How do we get an address for memory failure? SError is asynchronous, I don't
> think it sets the FAR. (SEA is synchronous and its not guaranteed to set the
Thank you to point that. sorry, my answer is not right. in fact, I think the FAR and
CPER are both not accurate for the  asynchronous SError. so guest OS can not try to recover.
but it can still know which application create this SError which is deferred by ESB, then guest OS close the APP.
by the way, for the synchronous SEA, do you think which address should be used? FAR or CPER that record come from ERR<n>ADDR?
I see Qualcomm series patches mainly use FAR not CPER record that come from ERR<n>ADDR for SEA.
so for the SEA case, I do not know which address is more accurate for FAR and CPER record

> FAR..). As far as I understand this information is in the CPER records in host
> memory.

> 
> If we did have an address it would be a host address, how is it converted to a
> guest IPA? I think Qemu should do this as part of its CPER record generation,
> once the host has decided the error wasn't catastrophic.
 thanks for your suggestion.

> 
> 
> Thanks,
> 
> James
> 
> 
> .
>