[PATCH v3 19/20] KVM: arm64: Handle RAS SErrors from EL2 on guest exit

James Morse james.morse at arm.com
Thu Oct 12 05:28:05 PDT 2017


Hi Marc,

On 11/10/17 11:37, Marc Zyngier wrote:
> On Thu, Oct 05 2017 at  8:18:11 pm BST, James Morse <james.morse at arm.com> wrote:
>> We expect to have firmware-first handling of RAS SErrors, with errors
>> notified via an APEI method. For systems without firmware-first, add
>> some minimal handling to KVM.
>>
>> There are two ways KVM can take an SError due to a guest, either may be a
>> RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
>> or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
>>
>> The current SError from EL2 code unmasks SError and tries to fence any
>> pending SError into a single instruction window. It then leaves SError
>> unmasked.
>>
>> With the v8.2 RAS Extensions we may take an SError for a 'corrected'
>> error, but KVM is only able to handle SError from EL2 if they occur
>> during this single instruction window...
>>
>> The RAS Extensions give us a new instruction to synchronise and
>> consume SErrors. The RAS Extensions document (ARM DDI0587),
>> '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising
>> SError interrupts generated by 'instructions, translation table walks,
>> hardware updates to the translation tables, and instruction fetches on
>> the same PE'. This makes ESB equivalent to KVMs existing
>> 'dsb, mrs-daifclr, isb' sequence.
>>
>> Use the alternatives to synchronise and consume any SError using ESB
>> instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT
>> in the exit_code so that we can restart the vcpu if it turns out this
>> SError has no impact on the vcpu.

>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
>> index 12ee62d6d410..96caa5328b3a 100644
>> --- a/arch/arm64/kvm/hyp/entry.S
>> +++ b/arch/arm64/kvm/hyp/entry.S
>> @@ -124,6 +124,17 @@ ENTRY(__guest_exit)
>>  	// Now restore the host regs
>>  	restore_callee_saved_regs x2
>>  
>> +alternative_if ARM64_HAS_RAS_EXTN
>> +	// If we have the RAS extensions we can consume a pending error
>> +	// without an unmask-SError and isb.
>> +	esb
>> +	mrs_s	x2, SYS_DISR_EL1
>> +	str	x2, [x1, #(VCPU_FAULT_DISR - VCPU_CONTEXT)]
>> +	cbz	x2, 1f
>> +	msr_s	SYS_DISR_EL1, xzr
>> +	orr	x0, x0, #(1<<ARM_EXIT_WITH_SERROR_BIT)
>> +1:	ret
>> +alternative_else
>>  	// If we have a pending asynchronous abort, now is the
>>  	// time to find out. From your VAXorcist book, page 666:
>>  	// "Threaten me not, oh Evil one!  For I speak with
>> @@ -135,6 +146,8 @@ ENTRY(__guest_exit)
>>  
>>  	dsb	sy		// Synchronize against in-flight ld/st
>>  	msr	daifclr, #4	// Unmask aborts
>> +	nop
> 
> Oops. You've now introduced an instruction in what was supposed to be a
> single instruction window (the isb). It means that we may fail to
> identify the Serror as having been generated by our synchronisation
> mechanism, and we'll panic for no good reason.

Gah! and I pointed this thing out in the commit message,

>> +alternative_endif
>>  
>>  	// This is our single instruction exception window. A pending
>>  	// SError is guaranteed to occur at the earliest when we unmask

and here it is in the diff.


> Moving the nop up will solve this.

Yes.

I've fixed this for v4, along with Julien's suggestions.


Thanks!

James



More information about the linux-arm-kernel mailing list