答复: [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature

Thu Sep 14 05:38:52 PDT 2017

Hi gengdongjiu,

On 08/09/17 18:36, gengdongjiu wrote:
>> The code to signal memory-failure to user-space doesn't depend on the CPU's RAS-extensions.
> I roughly check your answer and agree with your general idea.
> late I will check it in detail.

> I have a question, do you sure that if CPU does not support RAS-extensions kernel can still call
> memory-failure() to send signal to qemu?

If CONFIG_MEMORY_FAILURE is selected then the kernel has the code to send
SIGBUS_MCCERR_A* signals to user space.
This can be triggered by any GHES. A case in point: the 'AMD Seattle Overdrive'
under my desk has a HEST with four polled GHES entries. If any of these generate
a memory error the kernel will trigger the memory_failure() code.

Without all the ACPI stuff we still have CONFIG_HWPOISON_INJECT,
madvise(MADV_HWPOISON).
sysfs's: 'soft_offline_page' and 'hard_offline_page' as mechanisms that may
trigger memory_failure().

User space shouldn't try and guess whether its likely to get one of these.

I've been using these mechanisms to test SDEI virtualisation with kvmtool.

> After my checking the code, the general flow is RAS module detects the error or CPU consumes the
> hardware poison data, happen exception, then EL3 firmware records the address
to APEI table and
> send notification to kernel. Kernel parses the APEI table to get address and
call memory_failure() to
> identify the page to poison. That is to say, usually, after RAS detect the
error, it call memory_failure(),
> otherwise, it does not know whether this address is poison.

> I am worried about one thing, if hardware does not has RAS, OS cannot know which address is poison,
> so it cannot identify the address , then the address that is delivered to
Qemu(user space) may not right.

You've switched from talking about the CPU's 'ARM v8.2 RAS extensions' to 'RAS'.
Supporting memory_failure() is a linux:RAS feature, it doesn't depend on the
cpu:'ARM v8.2 RAS extensions'.

> As you said, kernel can also call memory_failure() even without RAS support. in this without RAS case,
> how it consider the address is poison and needs to send SIGBUS to QEMU?

Which component doesn't have RAS? The CPU? Okay, what about the memory controller:
The memory controller may catch a parity error during dram refresh/scrub, and
signal firmware via an interrupt. Firmware can then read the affected address
from the memory-controller's error registers and report it to the OS as a
firmware-first error.
The CPU doesn't need any RAS features for this to work.

To caricature your argument: 'the CPU doesn't have this particular version of
this particular RAS feature, thus no component in the system has any RAS feature'.

APEI's firmware-first is an abstraction so that we don't need to know which
system components have RAS features (or how to drive them) , we let firmware do
the work and tell us the results.

>> If Qemu supports notifying the guest about RAS errors using CPER records, it should generate a HEST describing firmware first. It can then
>> choose the notification methods, some of which may require optional KVM APIs to support.
>>
>> Seattle has a HEST, it doesn't support the CPU RAS-extensions. The kernel can notify user-space about memory_failure() on this machine. I
>> would expect Qemu to be able to receive signals and describe memory errors to a guest (1).
> 
> Usually we consider the address got from APEI table is poison. If so, I want to know, without RAS and APEI table, how it identify the address to hwpoison?

~s/APEI/CPER/

I agree the main path into memory_failure() is from APEI, and we get the address
from the CPER records. This isn't the only path into memory_failure, and there
may be more in the future.
None of the firmware-first stuff depends on CPU RAS features, it may be
reporting errors from some other component in the system.

Back to the issue at hand: should qemu/kvmtool generate a HEST?
If these tools want to inject emulated errors into a guest: yes. (this may be
totally independent of what the host supports)
If these tools want to pass memory-failure notifications for guest-memory into a
guest: yes.
You may want to make this depend on whether the host supports memory-failure
notifications, but you  shouldn't care where they come from.

Does the host support memory-failure notifications?
You can poke around in /proc to find this out, /proc/sys/vm has:
> memory_failure_early_kill
> memory_failure_recovery
when the kernel was built with CONFIG_MEMORY_FAILURE, but it user-space has
support for this stuff I don't know why you wouldn't unconditionally turn it on.
(what happens if you migrate between hosts with different support...)

James