[PATCH v4 00/21] SError rework + RAS&IESB for firmware first support

gengdongjiu gengdongjiu at huawei.com
Fri Nov 10 04:03:48 PST 2017



On 2017/11/10 2:14, James Morse wrote:
> Hi guys,
> 
> On 19/10/17 15:57, James Morse wrote:
>> Known issues:
> [...]
>>  * KVM-Migration: VDISR_EL2 is exposed to userspace as DISR_EL1, but how should
>>    HCR_EL2.VSE or VSESR_EL2 be migrated when the guest has an SError pending but
>>    hasn't taken it yet...?
> 
> I've been trying to work out how this pending-SError-migration could work.


Hi James,
  I have finished the Qemu part development about RAS and sent the patches out, I think the solution followed your suggestion and other people's suggestion in the mail discussion.
For example, not pass KVM exception information to Qemu, according to the SIGBUS type(BUS_MCEERR_AR or BUS_MCEERR_A0)
to use different notification type,  create guest APEI table and record CPER in rumtime for guest, etc

how about you have a look at these implementation and then we discuss this migration again? thanks.



> 
> If HCR_EL2.VSE is set then the guest will take a virtual SError when it next
> unmasks SError. Today this doesn't get migrated, but only KVM sets this bit as
> an attempt to kill the guest.
> 
> This will be more of a problem with GengDongjiu's SError CAP for triggering
> guest SError from user-space, which will also allow the VSESR_EL2 to be
> specified. (this register becomes the guest ESR_EL1 when the virtual SError is
> taken and is used to emulate firmware-first's NOTIFY_SEI and eventually
> kernel-first RAS). These errors are likely to be handled by the guest.
> 
> 
> We don't want to expose VSESR_EL2 to user-space, and for migration it isn't
> enough as a value of '0' doesn't tell us if HCR_EL2.VSE is set.
> 
> To get out of this corner: why not declare pending-SError-migration an invalid
> thing to do?
> 
> We can give Qemu a way to query if a virtual SError is (still) pending. Qemu
> would need to check this on each vcpu after migration, just before it throws the
> switch and the guest runs on the new host. This way the VSESR_EL2 value doesn't
> need migrating at all.
> 
> In the ideal world, Qemu could re-inject the last SError it triggered if there
> is still one pending when it migrates... but because KVM injects errors too, it
> would need to block migration until this flag is cleared.
> KVM can promise this doesn't change unless you run the vcpu, so provided the
> vcpu actually takes the SError at some point this thing can still be migrated.
> 
> This does make the VSE machinery hidden unmigratable state in KVM, which is nasty.
> 
> Can anyone suggest a better way?
> 
> 
> Thanks,
> 
> James
> 
> .
> 




More information about the linux-arm-kernel mailing list