[patch 00/37] cpu/hotplug, x86: Reworked parallel CPU bringup

Andrew Cooper andrew.cooper3 at citrix.com
Mon Apr 17 03:44:06 PDT 2023


On 17/04/2023 11:30 am, Peter Zijlstra wrote:
> On Sat, Apr 15, 2023 at 01:44:13AM +0200, Thomas Gleixner wrote:
>
>> Background
>> ----------
>>
>> The reason why people are interested in parallel bringup is to shorten
>> the (kexec) reboot time of cloud servers to reduce the downtime of the
>> VM tenants. There are obviously other interesting use cases for this
>> like VM startup time, embedded devices...
> ...
>
>>   There are two issue there:
>>
>>     a) The death by MCE broadcast problem
>>
>>        Quite some (contemporary) x86 CPU generations are affected by
>>        this:
>>
>>          - MCE can be broadcasted to all CPUs and not only issued locally
>>            to the CPU which triggered it.
>>
>>          - Any CPU which has CR4.MCE == 0, even if it sits in a wait
>>            for INIT/SIPI state, will cause an immediate shutdown of the
>>            machine if a broadcasted MCE is delivered.
> When doing kexec, CR4.MCE should already have been set to 1 by the prior
> kernel, no?

No(ish).  Purgatory can't take #MC, or NMIs for that matter.

It's cleaner to explicitly disable CR4.MCE and let the system reset
(with all the MC banks properly preserved), than it is to take #MC while
the IDT isn't in sync with the handlers, and wander off into the weeds.

~Andrew



More information about the linux-riscv mailing list