kexec reboot fails with extra wbinvd introduced for AME SME

Wed Jan 17 14:53:32 PST 2018

On 1/17/2018 2:01 PM, Tom Lendacky wrote:
> On 1/17/2018 1:42 PM, Linus Torvalds wrote:
>> On Tue, Jan 16, 2018 at 11:22 PM, Dave Young <dyoung at redhat.com> wrote:
>>>
>>> For the kexec reboot hang, if I remove the wbinvd in stop_this_cpu()
>>> then kexec works fine. like this:
>>
>> Honestly, I think we should apply that patch regardless.
>>
>> Using 'wbinvd' should not be some "just because of random reasons".
>> There are CPU's with errata on wbinvd, and the thing in general is
>> slow and nasty.
>>
>> Doing the wbinvd in a loop sounds even stranger.
>>
>> If we're only doing it because of some SME issue, why isn't it
>> dependent on SME? And why is it inside that loop at all?
> 
> My original patches did check for X86_FEATURE_SME and only do the
> wbinvd if SME was supported (although still in the loop).  The general
> consensus was to just do the wbinvd no matter what and so it is as it is
> today.
> 
> It can probably be outside of the loop.  The issue I was seeing was
> memory corruption from the stack when using halt() with paravirt ops
> enabled.  So a native_halt() should be used.
> 
>>
>> Anyway, does it work for you if you just do the wbinvd() once, outside
>> the loop? Admittedly the loop shouldn't actually loop (hlt with
>> interrupts disabled), but who the hell knows.. Some of the errata
>> around SME have been about machine check exceptions or something.
> 
> I think that should work as long as it's a native_wbinvd() call and it
> can also be conditional on boot_cpu_has(X86_FEATURE_SME).
> 
> I'll do some testing.

Looks like everything is good with the suggested changes.  Patch to follow
shortly.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> See commit a68e5c94f7d3 ("x86, hotplug: Move WBINVD back outside the
>> play_dead loop") for another example where wbinvd was inside a loop
>> and apparently caused some odd issues.
>>
>>               Linus
>>