32TB kdump

Fri Jun 21 10:17:14 EDT 2013

I have been testing recent kernel and kexec-tools for doing kdump of large
memories, and found good results.

--------------------------------
UV2000  memory: 32TB  crashkernel=2G at 4G
command line  /usr/bin/makedumpfile --non-cylic -c --message-level 23 -d 31 \
   --map-size 4096 -x /boot/vmlinux-3.10.0-rc5-linus-cpw+ /proc/vmcore \
   /tmp/cpw/dumpfile

page scanning  570 sec.
copying data  5795 sec. (72G)
(The data copy ran out of disk space at 23%, so the time and size above are
 extrapolated.)

--------------------------------
UV1000  memory: 8.85TB  crashkernel=1G at 5G
command line  /usr/bin/makedumpfile --non-cylic -c --message-level 23 -d 31 \
   --map-size 4096 -x /boot/vmlinux-3.9.6-cpw-medusa /proc/vmcore \
   /tmp/cpw/dumpfile

page scanning  175 sec.
copying data  2085 sec. (15G)
(The data copy ran out of disk space at 60%, so the time and size above are
 extrapolated.)

Notes/observations:
- These systems were idle, so this is the capture of basically system
  memory only.
- Both stable 3.9.6 and 3.10.0-rc5 worked.
- Use of crashkernel=1G,high was usually problematic.  I assume some problem
  with a conflict with something else using high memory.  I always use
  the form like 1G at 5G, finding memory by examining /proc/iomem.
- Time for copying data is dominated by data compression.  Writing 15G of
  compressed data to /dev/null takes about 35min.  Writing the same data
  but uncompressed (140G) to /dev/null takes about 6min.
  So a good workaround for a very large system might be to dump uncompressed
  to an SSD.
  The multi-threading of the crash kernel would produce a big gain.
- Use of mmap on /proc/vmcore increased page scanning speed from 4.4 minutes
  to 3 minutes.  It also increased data copying speed (unexpectedly) from
  38min. to 35min.
  So I think it is worthwhile to push Hatayama's 9-patch set into the kernel.
- I applied a 5-patch set from Takao Indoh to fix reset_devices handling of
  PCI devices.
  And I applied 3 kernel hacks of my own:
    - making a "Crash kernel low" section in /proc/iomem
    - make crashkernel avoid some things in pci_swiotlb_detect_override(),
      pci_swiotlb_detect_4gb() and register_mem_sect_under_node()
    - doing a crashkernel return from cpu_up()
  I don't understand why these should be necessary for my kernels but are
  not reported as problems elsewhere. I'm still investigating and will discuss
  those patches separately.
- my makedumpfile is an mmap-using version, with about 10 patches applied. I'll
  check which of those are not in the common version and discuss separately.
- my kexec is version 2.0.4 with 3 patches applied. I'll check which of those
  are not in the common version and discuss separately.

-Cliff