[PATCH v2] KVM: arm/arm64: Signal SIGBUS when stage2 discovers hwpoison memory
gengdongjiu
gengdongjiu at huawei.com
Wed Jun 21 23:47:06 PDT 2017
Hi James,
On 2017/6/21 20:44, James Morse wrote:
> Hi gengdongjiu,
>
> On 21/06/17 11:59, gengdongjiu wrote:
>> On 2017/6/21 17:53, James Morse wrote:
>>> I think we discussed this before[0], your CPU has a feature called 'hwpoison'
>>> that is uses to support RAS. Linux also has a feature called 'hwpoison' [1][2],
>>> which handles the offline-ing of memory pages when it receives a notification
>>> through APEI. I've tried to call this memory_failure() to avoid this confusion.
>>>
>>> This patch is to handle stage2 faults when the page was removed from the stage2
>>> mapping by the memory_failure() code. v3 of this patch[3] does a much better job
>>> of describing this.
>>>
>>> (... I don't think your question is related to this patch ...)
>>
>> I know your meaning about the Linux 'hwpoison' feature.
>
> Okay, I assume we are also talking about firmware-first RAS events and your APEI
> notifications use SEA.
>
>
> I think we are looking at different parts of the code, here is what I see should
> happen:
>
> For a Synchronous External Abort the ESR {D,I}FSC bits will be in the range that
> indicates an external abort. For a data:external-abort kvm_handle_guest_abort()
> will matches this with kvm_vcpu_dabt_isextabt() and makes no further attempt to
> handle the fault.
The kvm_vcpu_dabt_isextabt() will check the ESR EA bit[9]
The EA bit [9] is "IMPLEMENTATION DEFINED", so whether match with the kvm_vcpu_dabt_isextabt() depend on the CPU design.
In my platform the EA bit is zero for the RAS Synchronous External Abort, so will not matches the kvm_vcpu_dabt_isextabt(), so the code will continue handle the fault.
static inline bool kvm_vcpu_dabt_isextabt(struct kvm_vcpu *vcpu)
{
return kvm_vcpu_get_hsr(vcpu) & HSR_DABT_EA;
}
EA, bit [9]
External Abort. As described in the Architecture Reference Manual. Provides an IMPLEMENTATION
DEFINED classification of the external abort.
DFSC, bits [5:0], when synchronous external Data Abort
Data Fault Status Code. The possible values of this field are:
0b010000 Synchronous External Abort on memory access.
0b0101xx Synchronous External Abort on translation table walk or hardware update of translation
table. DFSC[1:0] encode the level.
Further values are described in the Architecture Reference Manual. The Parity Error codes are not
used and reserved in the RAS extension.
IFSC, bits [5:0], when synchronous external Instruction Abort
Instruction Fault Status Code. The possible values of this field are:
0b010000 Synchronous External Abort on memory access.
0b0101xx Synchronous External Abort on translation table walk or hardware update of translation
table. IFSC[1:0] encode the level.
Further values are described in the Architecture Reference Manual. The Parity Error codes are not
used and reserved in the RAS extension.
>
> Tyler's RAS series added an earlier check:
>> /*
>> * The host kernel will handle the synchronous external abort. There
>> * is no need to pass the error into the guest.
>> */
>> if (is_abort_sea(fault_status)) {
>> if (!handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
>> return 1;
>> }
>>
>
> This goes on to call ghes_notify_sea() which will handle the error and cause KVM
> to exit this function. KVM makes no further attempt to handle the fault as APEI
> should have done everything necessary. KVM will re-enter the guest, unless there
> are signals pending.
In my code, there is not Tyler's such modification.
firstly, the poison error is related with RAS, and poison error match the is_abort_sea(fault_status)
so the KVM will exit, and SIGBUS signal have no chance to be delivered.
I suggest it does not return in Tyler's patch or move kvm_send_hwpoison_signal function before return.
CC Tyler.
>
> (You're right that here the fault_ipa is the wrong thing to pass, but
> handle_guest_sea() doesn't use it...)
>
> We do need to enable HCR_EL2.TEA which was added with v8.2s RAS extensions, but
> the cpufeature patch is a pre-requisite.
HCR_EL2.TEA will Route synchronous External Abort exceptions from Non-secure EL0 and EL1 to EL2, if not routed to EL3.
in the firmware-first RAS solution, this bit can not control to route to El2 because the exception is route to EL3.
enabling HCR_EL2.TEA can be better for the non firmware-first RAS solution.
but can not solve this issue.
>
>
>> Let see the code that how to get the "pfn"
>>
>> ///get the pfn
>> fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
>> gfn = fault_ipa >> PAGE_SHIFT;
>> pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
>
>> As shown in above code, when happen SEA, the fault_ipa is got from the HPFAR_EL2 register.
>> if the HPFAR_EL2 does not record the IPA. the fault_ipa is zero, then gfn is zero, so the pfn is unknown.
>
>> so below judgement always false although firmware notify the memory_failure through APEI, because we do not get the right fault memory page.
>> using this API "kvm_vcpu_get_fault_ipa" can not get the right fault memory page if cpu does not update the HPFAR_EL2.
>
> The path to the below code that uses fault_ipa->gfn->pfn->hva is:
> kvm_handle_guest_abort()
> user_mem_abort()
> kvm_send_hwpoison_signal().
>
> But for an external abort due to RAS we never get past kvm_handle_guest_abort()
> as we call out to the APEI ghes to handle the RAS error instead.
so this patch can not work.
>
> For a data external abort that wasn't due to RAS we still don't get here as KVM
> will hit the vcpu with kvm_inject_vabt() instead.
poison error is only related with RAS, I think we can mainly consider the RAS.
>
>
>> + if (pfn == KVM_PFN_ERR_HWPOISON) {
>> + kvm_send_hwpoison_signal(hva, vma);
>> + return 0;
>> + }
>
> Are you seeing a guest repeatedly trigger external-abort on the same address?
> If so, can you add debug messages to check if handle_guest_sea() is called? Does
> it find work to do? If so kvm_handle_guest_abort() should exit.
If use Tyler's this modification, the kvm_handle_guest_abort will exit when triggering synchronous External Abort,
in my verification, not have this modification, so the kvm_handle_guest_abort() does not exit
this change is added recently by Tyler
> Do your CPER records cause memory_failure() to be run?
> Does try_to_unmap() in hwpoison_user_mappings() run and succeed? If so the
> faulty page should be unmapped from stage2, from now on the guest should only
> trigger normal stage2 faults for this address.
>
> How does your firmware choose to route injected-External-Aborts for APEI
> notifications? Do we need to enable HCR_EL2.TEA to make this work properly?
my firmware will route the injected-External-Aborts to the hypervisor if the error come from guest OS.
enabling the CR_EL2.TEA can not solve this issue.
because the exception still route to EL3. the SIGBUS signal have not chance to be sent
so I have two suggestions for this issue:
(1) modify Tyler's patch, not return.
(2) call the kvm_send_hwpoison_signal before return.
>
>
>> so may be you need to double confirm that whether armv8.0/armv8.2 standard
>> CPU can always update the HPFAR_EL2 registers.
>
> I don't know what the CPUs do, but the ARM-ARM allows the FAR to be not-valid
> for some external aborts. This is indicated by the Far-not-Valid bit in the ESR.
>
> With firmware-first RAS the only external aborts that Linux should see are SEA
> APEI notifications. We shouldn't expect firmware to set much beyond the minimum
> for these. The KVM code touched by this patch shouldn't run for an APEI
> notification.
>
>
> Thanks,
>
> James
>
>
> .
>
More information about the linux-arm-kernel
mailing list