[PATCH v3 19/20] KVM: arm64: Handle RAS SErrors from EL2 on guest exit
James Morse
james.morse at arm.com
Thu Oct 12 05:28:05 PDT 2017
Hi Marc,
On 11/10/17 11:37, Marc Zyngier wrote:
> On Thu, Oct 05 2017 at 8:18:11 pm BST, James Morse <james.morse at arm.com> wrote:
>> We expect to have firmware-first handling of RAS SErrors, with errors
>> notified via an APEI method. For systems without firmware-first, add
>> some minimal handling to KVM.
>>
>> There are two ways KVM can take an SError due to a guest, either may be a
>> RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
>> or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
>>
>> The current SError from EL2 code unmasks SError and tries to fence any
>> pending SError into a single instruction window. It then leaves SError
>> unmasked.
>>
>> With the v8.2 RAS Extensions we may take an SError for a 'corrected'
>> error, but KVM is only able to handle SError from EL2 if they occur
>> during this single instruction window...
>>
>> The RAS Extensions give us a new instruction to synchronise and
>> consume SErrors. The RAS Extensions document (ARM DDI0587),
>> '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
>> SError interrupts generated by 'instructions, translation table walks,
>> hardware updates to the translation tables, and instruction fetches on
>> the same PE'. This makes ESB equivalent to KVMs existing
>> 'dsb, mrs-daifclr, isb' sequence.
>>
>> Use the alternatives to synchronise and consume any SError using ESB
>> instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
>> in the exit_code so that we can restart the vcpu if it turns out this
>> SError has no impact on the vcpu.
>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 12ee62d6d410..96caa5328b3a 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -124,6 +124,17 @@ ENTRY(__guest_exit)
>> // Now restore the host regs
>> restore_callee_saved_regs x2
>>
>> +alternative_if ARM64_HAS_RAS_EXTN
>> + // If we have the RAS extensions we can consume a pending error
>> + // without an unmask-SError and isb.
>> + esb
>> + mrs_s x2, SYS_DISR_EL1
>> + str x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)]
>> + cbz x2, 1f
>> + msr_s SYS_DISR_EL1, xzr
>> + orr x0, x0, #(1<<ARM_EXIT_WITH_SERROR_BIT)
>> +1: ret
>> +alternative_else
>> // If we have a pending asynchronous abort, now is the
>> // time to find out. From your VAXorcist book, page 666:
>> // "Threaten me not, oh Evil one! For I speak with
>> @@ -135,6 +146,8 @@ ENTRY(__guest_exit)
>>
>> dsb sy // Synchronize against in-flight ld/st
>> msr daifclr, #4 // Unmask aborts
>> + nop
>
> Oops. You've now introduced an instruction in what was supposed to be a
> single instruction window (the isb). It means that we may fail to
> identify the Serror as having been generated by our synchronisation
> mechanism, and we'll panic for no good reason.
Gah! and I pointed this thing out in the commit message,
>> +alternative_endif
>>
>> // This is our single instruction exception window. A pending
>> // SError is guaranteed to occur at the earliest when we unmask
and here it is in the diff.
> Moving the nop up will solve this.
Yes.
I've fixed this for v4, along with Julien's suggestions.
Thanks!
James
More information about the linux-arm-kernel
mailing list