[PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu

Neil Horman nhorman at tuxdriver.com
Tue Nov 27 10:34:44 EST 2007

On Tue, Nov 27, 2007 at 07:56:44AM -0700, Eric W. Biederman wrote:
> Andi Kleen <ak at suse.de> writes:
> > his is any less reliable that what we have currently.
> >> 
> >> It doesn't make things more reliable, and it adds code to a code path
> >> that already has to much code to be solid reliable (thus your
> >> problem). 
> >> 
> >> Putting the system back in PIC legacy mode on the kexec on panic path
> >> was supposed to be a short term hack until we could remove the need
> >> by always deliver interrupts in apic mode.
> >> 
> >> If you can't root cause your problem and figure out how the apics
> >> are misconfigured for legacy mode
> >
> > Probably legacy mode always routes to CPU #0. Makes sense and is
> > not really a misconfiguration of legacy mode.
> Possible. So far I have not seen a hardware setup that would force
> interrupts to cpu #0 in legacy mode.  But I would not be truly
> surprised if it happened that there was hardware that only worked that
> way.

That would certainly explain the behavior I am observing here.\

> > But if CPU #0 has interrupts disabled no interrupts get delivered.
> >
> > So choices are:
> > - Move to CPU #0
> > - Do not use legacy mode during shutdown.
>     (Do not use legacy mode in the kdump kernel. removing it from shutdown
>      is just minor optimization)
> > - Or do not rely on interrupts after enabling legacy mode
> > - Or do not disable interrupts on the other CPUs when they're
> > halted.
> >
> > First and last option are probably unreliable for the kdump case.
> > Second or third sound best. 
> >
> > I suspect the real fix would be to enable IOAPIC mode really
> > early and never use the timers in legacy mode. Then the kdump
> > kernel wouldn't care about the legacy mode pointing to the wrong CPU.
> Exactly.  If we can work out the details that should be a much more reliable
> mode of operation.
> > IIrc Eric even had  a patch for that a long time ago, but it broke some 
> > things so it wasn't included. But perhaps it should be revisited.
> My real problem was the failure case was obscure (a bad interaction
> with ACPI on Linus's laptop) and I didn't have the time to track it
> down when it showed up.
> My patch had two parts.  Some cleanups to enable the code to be enabled
> early, and the actually early enable.  I figure if we can get the
> cleanups in one major kernel version and then in the next enable
> the apic mode before we start getting interrupts we should be in good
> shape.
> I expect with x86 becoming an embedded platform with multiple cpus we
> may start seeing systems that don't actually support legacy PIC mode
> for interrupt delivery.
do you have a pointer to the old patch set?  I'd like to try it out on the failing system here.


> Eric

