[PATCH] kexec: force x86_64 arches to boot kdump kernels on boot cpu
Yinghai Lu
yhlu.kernel at gmail.com
Tue Dec 11 20:07:18 EST 2007
On Dec 11, 2007 4:52 PM, Neil Horman <nhorman at tuxdriver.com> wrote:
> On Tue, Dec 11, 2007 at 04:16:32PM -0800, Ben Woodard wrote:
> > We may need to go back and do some additional work on this. It doesn't
> > seem to be quite as cut and dried as we initially thought.
> >
> > This quirk doesn't appear to work on virtually the same motherboard with
> > the barcelona processors in it. It also may be sensitive to the firmware
> > version. More extensive testing on a larger number of pre-production is
> > not showing it to be as effective as it appeared to be initially on the
> > testbed.
> >
> > I'm doing some retesting to figure out what exact situations and
> > collection of patches were able to make it work before.
> >
> Ben, please lets be clear about this. You say this patch doesn't help on a new
> system. Even thought its almost the exact same system, its not the same system.
> Does this patch work consistently on the system you initially reported the
> problem on? I've done enough work on this at this point that I'm invested in
> not abandoning this fix. If this solves the problem on dual core system, but
> not quad core, I'd much rather move forward with this fix and address your quad
> core problem as a separate issue.
>
> Neil
>
>
> > -ben
> >
> >
> >
> > Neil Horman wrote:
> > > Recently a kdump bug was discovered in which a system would hang inside
> > > calibrate_delay during the booting of the kdump kernel. This was caused by the
> > > fact that the jiffies counter was not being incremented during timer
> > > calibration. The root cause of this problem was found to be a bios
> > > misconfiguration of the hypertransport bus. On system affected by this hang,
> > > the bios had assigned APIC ids which used extended apic bits (more than the
> > > nominal 4 bit ids's), but failed to configure bit 17 of the hypertransport
> > > transaction config register, which indicated that the mask for the destination
> > > field of interrupt packets accross the ht bus (see section 3.3.9 of
> > > http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/26094.PDF).
> > > If a crash occurs on a cpu with an APIC id that extends beyond 4 bits, it will
> > > not recieve interrupts during the kdump kernel boot, and this hang will be the
> > > result. The fix is to add this patch, whcih add an early pci quirk check, to
> > > forcibly enable this bit in the httcfg register. This enables all cpus on a
> > > system to receive interrupts, and allows kdump kernel bootup to procede
> > > normally.
> > >
> > > Regards
> > > Neil
> > >
> > >
> > > Signed-off-by: Neil Horman <nhorman at tuxdriver.com>
> > >
...
> > > static struct chipset early_qrk[] __initdata = {
> > > - { PCI_VENDOR_ID_NVIDIA, nvidia_bugs },
> > > - { PCI_VENDOR_ID_VIA, via_bugs },
> > > - { PCI_VENDOR_ID_ATI, ati_bugs },
> > > + { PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, nvidia_bugs },
> > > + { PCI_VENDOR_ID_VIA, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, via_bugs },
> > > + { PCI_VENDOR_ID_ATI, PCI_ANY_ID, PCI_CLASS_BRIDGE_PCI, PCI_ANY_ID, ati_bugs },
> > > + { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB, PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config },
==>
+ { PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB,
PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID, fix_hypertransport_config },
+ { PCI_VENDOR_ID_AMD, 0x1200 , PCI_CLASS_BRIDGE_HOST, PCI_ANY_ID,
fix_hypertransport_config },
I still think good way is that you ask Supermicro to update their BIOS
to use newer code from AMD.
YH
More information about the kexec
mailing list