[PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error

Fri May 5 05:31:18 PDT 2017

HI James,

2017-05-05 0:52 GMT+08:00 gengdongjiu <gengdj.1984 at gmail.com>:
> Dear James,
>    Thanks a lot for your review and comments. I am very sorry for the
> late response.
>
>
> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984 at gmail.com>:
>>  Hi Dongjiu Geng,
>>
>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>> when happen SEA, deliver signal bus and handle the ioctl that
>>> inject SEA abort to guest, so that guest can handle the SEA error.
>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 105b6ab..a96594f 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -20,8 +20,10 @@
>>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>>>   __coherent_cache_guest_page(vcpu, pfn, size);
>>>  }
>>>
>>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool hwpoison)
>>> +{
>>> + siginfo_t info;
>>> +
>>> + info.si_signo   = SIGBUS;
>>> + info.si_errno   = 0;
>>> + if (hwpoison)
>>> + info.si_code    = BUS_MCEERR_AR;
>>> + else
>>> + info.si_code    = 0;
>>> +
>>> + info.si_addr    = (void __user *)address;
>>> + if (hugetlb)
>>> + info.si_addr_lsb = PMD_SHIFT;
>>> + else
>>> + info.si_addr_lsb = PAGE_SHIFT;
>>> +
>>> + send_sig_info(SIGBUS, &info, current);
>>> +}
>>> +
>> «  [hide part of quote]
>>
>> Punit reviewed the other version of this patch, this PMD_SHIFT is not the right
>> thing to do, it needs a more accurate set of calls and shifts as there may be
>> hugetlbfs pages other than PMD_SIZE.
>>
>> https://www.spinics.net/lists/arm-kernel/msg568919.html
>>
>> I haven't posted a new version of that patch because I was still hunting a bug
>> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
>> returned to userspace instead of this hwpoison code being invoked.
>
>   Ok, got it, thanks for your information.
>>
>> Please avoid duplicating functionality between patches, it wastes reviewers
>> time, especially when we know there are problems with this approach.
>>
>>
>>> +static void kvm_handle_bad_page(unsigned long address,
>>> + bool hugetlb, bool hwpoison)
>>> +{
>>> + /* handle both hwpoison and other synchronous external Abort */
>>> + if (hwpoison)
>>> + kvm_send_signal(address, hugetlb, true);
>>> + else
>>> + kvm_send_signal(address, hugetlb, false);
>>> +}
>>
>> Why the extra level of indirection? We only want to signal userspace like this
>> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
>> of the kernel that decoded the error.
>
> For the SEA, the are maily two types:
> 0b010000 Synchronous External Abort on memory access.
> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
> encode the level.
>
> hwpoison should belong to the  "Synchronous External Abort on memory access"
> if the SEA type is not hwpoison, such as page table walk, do you mean
> KVM do not deliver the SIGBUS?
> If so, how the KVM handle the SEA type other than hwpoison?
>
>>
>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>>
>>
>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>> index b37446a..780e3c4 100644
>>> --- a/arch/arm64/kvm/guest.c
>>> +++ b/arch/arm64/kvm/guest.c
>>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>   return -EINVAL;
>>>  }
>>>
>>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>>> +{
>>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>>> +
>>> + return 0;
>>> +}
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index bb02909..1d2e2e7 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>>  /* Available with KVM_CAP_X86_SMM */
>>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
>>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
>>>
>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>
>>
>> Why do we need a userspace API for SEA? It can also be done by using
>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
>> way is you can choose which ESR value to use.
>>
>> Adding a new API call to do something you could do with an old one doesn't look
>> right.
>
> James, I considered your suggestion before that use the
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
> not have difference to use the alread existed KVM API.  so may be
> changing the vcpu registers in qemu will duplicate with the KVM APIs.
>
> injection a SEA is no more than setting some registers: elr_el1, PC,
> PSTATE, SPSR_el1, far_el1, esr_el1
> I seen this KVM API do the same thing as Qemu.  do you found call this
> API will have issue and necessary to choose another ESR value?

I consider again, you are right.

when guest OS happen an SEA, My current solution is shown below:

(1) host EL3 firmware firstly handle the SEA error and generate the CPER record.
(2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.
(3) then jump the EL2 hypervisor

so the EL2 hypervisor uses the ESR that come from esr_el3,  here the
ESR(esr_el3) value may be different with the exist KVM API's ESR.

>
> I pasted the alread existed KVM API code:
>
> static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned
> long addr)
> {
>  unsigned long cpsr = *vcpu_cpsr(vcpu);
>  bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  u32 esr = 0;
>  *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
>  *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
>  *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
>  *vcpu_spsr(vcpu) = cpsr;
>  vcpu_sys_reg(vcpu, FAR_EL1) = addr;
>  /*
>   * Build an {i,d}abort, depending on the level and the
>   * instruction set. Report an external synchronous abort.
>   */
>  if (kvm_vcpu_trap_il_is32bit(vcpu))
>   esr |= ESR_ELx_IL;
>  /*
>   * Here, the guest runs in AArch64 mode when in EL1. If we get
>   * an AArch32 fault, it means we managed to trap an EL0 fault.
>   */
>  if (is_aarch32 || (cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
>   esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
>  else
>   esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
>  if (!is_iabt)
>   esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
> }
>
> static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
>     unsigned long addr)
> {
>  u32 vect_offset;
>  u32 *far, *fsr;
>  bool is_lpae;
>  if (is_pabt) {
>   vect_offset = 12;
>   far = &vcpu_cp15(vcpu, c6_IFAR);
>   fsr = &vcpu_cp15(vcpu, c5_IFSR);
>  } else { /* !iabt */
>   vect_offset = 16;
>   far = &vcpu_cp15(vcpu, c6_DFAR);
>   fsr = &vcpu_cp15(vcpu, c5_DFSR);
>  }
>  prepare_fault32(vcpu, COMPAT_PSR_MODE_ABT | COMPAT_PSR_A_BIT, vect_offset);
>  *far = addr;
>  /* Give the guest an IMPLEMENTATION DEFINED exception */
>  is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
>  if (is_lpae)
>   *fsr = 1 << 9 | 0x34;
>  else
>   *fsr = 0x14;
> }
>
>
> /**
>  * kvm_inject_dabt - inject a data abort into the guest
>  * @vcpu: The VCPU to receive the undefined exception
>  * @addr: The address to report in the DFAR
>  *
>  * It is assumed that this code is called from the VCPU thread and that the
>  * VCPU therefore is not currently executing guest code.
>  */
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
> {
>  if (!(vcpu->arch.hcr_el2 & HCR_RW))
>   inject_abt32(vcpu, false, addr);
>  else
>   inject_abt64(vcpu, false, addr);
> }
>
>
>>
>>
>> Thanks,
>>
>> James