kdump: quad core Opteron

Bob Montgomery bob.montgomery at hp.com
Tue Oct 7 11:59:51 EDT 2008


On Tue, 2008-10-07 at 13:24 +0000, Vivek Goyal wrote:
> On Tue, Oct 07, 2008 at 06:21:52PM +0530, Chandru wrote:
> > kdump on a quad core Opteron blade machine doesn't give a complete
> > vmcore on the system.   All works well until we attempt to copy
> > /proc/vmcore to some target place ( disk , n/w ). The system immediately
> > resets without any OS messages after having copied few mb's of vmcore
> > file.  Problem also occurs with 2.6.27-rc8 and latest kexec-tools.  If
> > we pass 'mem=4G' as boot parameter to the first kernel, then kdump
> > succeeds in copying a readable vmcore to /var/crash.
> >
> 
> Hi Chandru,
> 
> How much memory this system has got. Can you also paste the output of
> /proc/iomem of first kernel.
> 
> Does this system has GART? So looks like we are accessing some memory area
> which platform does not like. (We saw issues with GART in the past.)
> 
> Can you also provide /proc/vmcore ELF header (readelf output), in both
> the cases (mem=4G and without that).
> 
> You can try putting some printk in /proc/vmcore code and see which
> physical memory area you are accessing when system goes bust. If in all
> the failure cases it is same physical memory area, then we can try to find
> what's so special about it.

Or you can assume this is pretty much exactly the problem I ran into in
August.  I've attached the patch that I'm using with our 2.6.18 kernel
to disable CPU-side access by the GART, which prevents the problem on
our Family 10H systems.  You'll need to fix the directory name for
kernels newer than the arch/x86_64 merge.

Now that someone else has seen the problem, if this fixes it, I'll
submit the patch upstream. 

Here's the README for the patch:

This patch changes the initialization of the GART (in
pci-gart.c:init_k8_gatt) to set the DisGartCpu bit in the GART Aperture
Control Register.  Setting the bit Disables requests from the CPUs from
accessing the GART.  In other words, CPU memory accesses within the
range of addresses in the aperture will not cause the GART to perform an
address translation.  The aperture area was already being unmapped at
the kernel level with clear_kernel_mapping() to prevent accesses from
the CPU, but that kernel level unmapping is not in effect in the kexec'd
kdump kernel.  By disabling the CPU-side accesses within the GART, which
does persist through the kexec of the kdump kernel, the kdump kernel is
prevented from interacting with the GART during accesses to the dump
memory areas which include the address range of the GART aperture.
Although the patch can be applied to the kdump kernel, it is not
exercised there because the kdump kernel doesn't attempt to initialize
the GART.

Bob Montgomery
working at HP




-------------- next part --------------
A non-text attachment was scrubbed...
Name: gart.cpuside.patch
Type: text/x-patch
Size: 557 bytes
Desc: not available
Url : http://lists.infradead.org/pipermail/kexec/attachments/20081007/8458d2df/attachment.bin 


More information about the kexec mailing list