[PATCH v3 7/8] arm64: exception: handle asynchronous SError interrupt

Thu Apr 27 22:55:51 EDT 2017

Hi James,

Thanks for your explanation and suggests.

On 2017/4/25 1:14, James Morse wrote:
> Hi Wang Xiongfeng,
> 
> On 21/04/17 12:33, Xiongfeng Wang wrote:
>> On 2017/4/20 16:52, James Morse wrote:
>>> On 19/04/17 03:37, Xiongfeng Wang wrote:
>>>> On 2017/4/18 18:51, James Morse wrote:
>>>>> The host expects to receive physical SError Interrupts. The ARM-ARM doesn't
>>>>> describe a way to inject these as they are generated by the CPU.
>>>>>
>>>>> Am I right in thinking you want this to use SError Interrupts as an APEI
>>>>> notification? (This isn't a CPU thing so the RAS spec doesn't cover this use)
>>>>
>>>> Yes, using sei as an APEI notification is one part of my consideration. Another use is for ESB.
>>>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>>>> describes the SEI recovery process when ESB is implemented.
>>>>
>>>> In this situation, SEI is routed to EL3 (SCR_EL3.EA = 1). When an SEI occurs in EL0 and not been taken immediately,
>>>> and then an ESB instruction at SVC entry is executed, SEI is taken to EL3. The ESB at SVC entry is
>>>> used for preventing the error propagating from user space to kernel space. The EL3 SEI handler collects
>>>
>>>> the errors and fills in the APEI table, and then jump to EL2 SEI handler. EL2 SEI handler inject
>>>> an vSEI into EL1 by setting HCR_EL2.VSE = 1, so that when returned to OS, an SEI is pending.
>>>
>>> This step has confused me. How would this work with VHE where the host runs at
>>> EL2 and there is nothing at Host:EL1?
>>
>> RAS spec 6.5.3 'Example software sequences: Variant: asynchronous External Abort with ESB'
>> I don't know whether the step described in the section is designed for guest os or host os or both.
>> Yes, it doesn't work with VHE where the host runs at EL2.
> 
> If it uses Virtual SError, it must be for an OS running at EL1, as the code
> running at EL2 (VHE:Linux, firmware or Xen) must set HCR_EL2.AMO to make this work.
> 
> I can't find the section you describe in this RAS spec:
> https://developer.arm.com/docs/ddi0587/latest/armreliabilityavailabilityandserviceabilityrasspecificationarmv8forthearmv8aarchitectureprofile
> 
I got the RAS spec from my company's ARM account. The document number is PRD03-PRDC-010953 .

> 
>>> >From your description I assume you have some firmware resident at EL2.
> 
>> Our actual SEI step is planned as follows:
> 
> Ah, okay. You can't use Virtual SError at all then, as firmware doesn't own EL2:
> KVM does. KVM will save/restore the HCR_EL2 register as part of its world
> switch. Firmware must not modify the hyp registers behind the hyper-visors back.
> 
> 
>> Host OS:  EL0/EL1 -> EL3 -> EL0/EL1
> 
> i.e. While the host was running, so EL3 sees HCR_EL2.AMO clear, and directs the
> SEI to EL1.
> 
>> Guest OS:  EL0/EL1 -> EL3 -> EL2 -> EL0/EL1
> 
> i.e. While the guest was running, so EL3 sees HCR_EL2.AMO set, directs the SEI
> to EL2 where KVM performs the world-switch and returns to host EL1.
> 
> Looks sensible.
> 
> 
>> In guest os situation, we can inject an vSEI and return to where the SEI is taken from.
> 
> An SEI from firmware should always end up in the host OS. If a guest was running
> KVM is responsible for doing the world switch to restore host EL1 and return
> there. This should all work today.
> 
> 
>> But in host os situation, we can't inject an vSEI (if we don't set HCR_EL2.AMO), so we have to jump to EL1 SEI vector.
> 
> Because your firmware is at EL3 you have to check PSTATE.A and jump into the
> SError vector in both cases, you just choose if its VBAR_EL2 or VBAR_EL1 based
> on HCR_EL2.AMO.
> 
> 
>> Then the process following ESB won't be executed becuase SEI is taken to EL3 from the ESB instruction in EL1, and when control
>> is returned to OS, we are in EL1 SEI vector rather than the ESB instruction.
> 
> Firmware should set ELR_EL1 so that an eret from the EL1 SError vector takes you
> back to the ESB instruction that took us to EL3 in the first place.
> 
> 
> There is a problem here mixing SError as the CPU->Software notification of RAS
> errors and as APEI's SEI Software->Software notification that a firmware-first
> error has been handled by EL3.
> 
> To elaborate:
> The routine in this patch was something like:
> 1 ESB
> 2 Read DISR_EL1
> 3 If set, branch to the SError handler.
> 
> If we have firmware-first ESB will generate an SError as PSTATE.A at EL{1,2}
> doesn't mask SError for EL3. Firmware can then handle the RAS error. With this
> the CPU->Software story is done. Firmware notifying the OS via APEI is a
> different problem.
> 
> If the error doesn't need to be notified to the OS via SEI, firmware can return
> to step 1 above with DISR_EL1 clear. The OS may be told via another mechanism
> such as polling or an irq.
> 
> If PSTATE.A was clear, firmware can return to the SError vector. If PSTATE.A was
> set and firmware knows this was due to an ESB, it can set DISR_EL1. The problem
> is firmware can't know if the SError was due to an ESB.

Yes, if implicit ESB is not implemented, we can't know if the sError was due to
an ESB unless we only set PSTATE.A when ESB is executed.
> 
> The only time we are likely to have PSTATE.A masked is when changing exception
> level. For this we can rely on IESB, so step 1 isn't necessary in the routine
> above. Firmware knows if an SError taken to EL3 is due to an IESB, as the
> ESR_EL3.IESB bit will be set, in this case it can write to DISR_EL1 and return.
> 
> In Linux we can use IESB for changes in exception level, and ESB with PSTATE.A
> clear for any other need. (e.g. __switch_to()). Firmware can notify us using SEI
> if we use either of these two patterns. This may not work for other OS (Xen?),
> but this is a problem with using SEI as an APEI notification method.
> 
> (Some cleanup of our SError unmasking is needed, I have some patches I will post
> on a branch shortly)
> 
> 
>> It is ok to just ignore the process following the ESB instruction in el0_sync, because the process will be sent SIGBUS signal.
> 
> I don't understand. How will Linux know the process caused an error if we
> neither take an SError nor read DISR_EL1 after an ESB?
> 
I think there may be some misunderstanding here. The ESB instruction is placed in kernel_entry
of el0_sync and el0_irq. For the el0_sync, such as an syscall from userspace, after ESB is executed,
we check whether DISR.A is set. If it is not set, we go on to process the syscall. If it is set, we
jump to sError vector and then just eret.
But for el0_irq, after ESB is executed, we check whether DISR.A is set. If it is set, we jump to
(using BL) sError vector, process SEI, return back to irq handler and process irq, and then eret.

> 
>> But for el0_irq, the irq process following the ESB may should be executed because the irq is not related to the user process.
> 
> If we exit EL0 for an IRQ, and take an SError due to IESB we know the fault was
> due to EL0, not the exception level we entered to process the IRQ.
> 
> 
>> If we set HCR_EL2.AMO when SCR_EL3.EA is set. We can still inject an vSEI into host OS in EL3.
> 
> (from EL3?) Only if the host OS is running at EL1 and isn't using EL2. Firmware
> can't know if we're using EL2 so this can't work.
> 
> 
>> Physical SError won't be taken to EL2 because SCR_EL3.EA is set. But this may be too racy is
>> not consistent with ARM rules.
> 
> EL3 flipping bits in EL2 system registers behind EL2 software's back is going to
> have strange side effects. e.g. some Hypervisor may not change HCR_EL2 when
> switching VM-A to VM-B as the configuration is the same; now the virtual SError
> will be delivered to the wrong VM. (KVM doesn't do this, but Xen might.)
> 
> 
>>>>> You cant use SError to cover all the possible RAS exceptions. We already have
>>>>> this problem using SEI if PSTATE.A was set and the exception was an imprecise
>>>>> abort from EL2. We can't return to the interrupted context and we can't deliver
>>>>> an SError to EL2 either.
>>>>
>>>> SEI came from EL2 and PSTATE.A is set. Is it the situation where VHE is enabled and CPU is running
>>>> in kernel space. If SEI occurs in kernel space, can we just panic or shutdown.
>>>
>>> firmware-panic? We can do a little better than that:
>>> If the IESB bit is set in the ESR we can behave as if this were an ESB and have
>>> firmware write an appropriate ESR to DISR_EL1 if PSTATE.A is set and the
>>> exception came from EL2.
>>
>> This method may solve the problem I said above.
>> Is IESB a new bit added int ESR in the newest RAS spec?
> 
> The latest ARM-ARM (DDI0487B.a) describes it as part of the "RAS Extension in
> ARMv8.2" in section 'A1.7.5 The Reliability, Availability, and Serviceability
> (RAS) Extension'.
> 
> Jumping to 'D7.2.28 ESR_ELx, Exception Syndrome Register (ELx)', then page
> D7-2284 for the 'ISS encoding for an SError interrupt', it indicates "the SError
> interrupt was synchronized by the implicit [ESB] and taken immediately"
> 
> 
> As above, with IESB for changes in exception level, and unmasking PSTATE.A for
> any other use of ESB, Linux can make sure that firmware can always notify us of
> an APEI event using SEI.
> 
> Another OS (or even UEFI) may not do the same, so firmware should be prepared
> for ESB to be executed with PSTATE.A masked (i.e, not using IESB). If there is
> no wider firmware-policy for this (reboot, notify a remote CPU), then returning
> to the faulting location and trying to inject SError later is all we can do.
> 
> 
> Thanks,
> 
> James
> 
> .
> 
Thanks,
Wang Xiongfeng