[PATCH] x86: Disable kexec for TDX guests

Dave Hansen dave.hansen at intel.com
Sun Mar 26 10:01:23 PDT 2023


On 3/25/23 12:25, Kirill A. Shutemov wrote:
> On Sat, Mar 25, 2023 at 09:25:36AM -0700, Dave Hansen wrote:
>> On 3/25/23 09:01, Kirill A. Shutemov wrote:
>>> The last item is tricky. TDX guests use ACPI MADT MPWK to bring up
>>> secondary CPUs. The mechanism doesn't allow to put a CPU back offline if
>>> it has woken up.
>> ...
>>> +int arch_kexec_load(void)
>>> +{
>>> +	if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
>>> +		pr_warn_once("Disable kexec: not yet supported in TDX guest\n");
>>> +		return -EOPNOTSUPP;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>
>> So, let's put all this together:
>>
>> 1. TDX implementations use MADT for wakeup exclusively right now (but
>>    are not necessarily _required_ to do so forever)
>> 2. MADT doesn't support CPU offlining
>> 3. kexec() requires offlining
>>
>> Thus, current TDX implementations can't support TDX guests.  This
>> *doesn't* say that TDX will always use the MADT for wakeups.
>>
>> Yet, the check you have here is for TDX and *not* for the MADT.
> 
> As I described in the commit message there are more than MADT that is
> required to get kexec in TDX guest.

I kinda think we should do both.

Let's make sure that all systems that depend on MADT wakeups can't
kexec() until the ACPI folks work out what to do there.

Separately, let's either fix or *mark* the kexec()-incompatible pieces
that *ARE* specific to TDX.

>> That seems wrong.
>>
>> Let's say SEV or arm64 comes along and uses the MADT for their guests.
>> They'll add another arch_kexec_load(), with a check for *their* feature.
>>
>> This all seems like you should be disabling kexec() the moment the MADT
>> CPU wakeup is used instead of making it based on TDX.
> 
> I guess we can go this path if you are fine with taking CR4.MCE and shared
> memory reverting patches (they require some rework, but I can get them
> into shape quickly). After that we can forbid kexec on machines with MADT
> if nr_cpus > 1.

This goes back to what I asked before: is anyone actually going to *use*
a single-processor system that wants to kexec()?  If not, let's not
waste the time to introduce code that is just going to bitrot.  Just
mark it broken and move on with life.

I'm also a _bit_ curious what the implications of the CR4.MCE
preservation are.  IIRC, systems are quite a bit less stable when
CR4.MCE==0.  So, maybe there are some benefits to leaving it set during
kexec() for everyone.  But, that probably also involves having a #MC
handler in the mix and that's obviously trouble while we're switching
kernels and messing around in the guts of things like paging configuration.

The CR4.MCE change is simple enough, but it also _completely_ glossed
over the implications to non-TDX users.

I guess the overall message here is: please try to think of the big
picture a _little_ outside of the TDX world.  This patch's approach is a
bit myopic.



More information about the kexec mailing list