Kexec & Memory Zones question

WANG Cong xiyou.wangcong at gmail.com
Tue May 17 22:40:26 EDT 2011


On Tue, 17 May 2011 19:05:13 -0700, Sujit V wrote:

> We found the root cause for this issue in the bootmem allocator.
> 
> The 96GB NUMA system has two memory nodes each with 48GB. node 0 had
> zone dma, dma32 & normal
> node 1 had only zone normal.
> 
> During the early boot i.e kernel/setup.c The bootmem allocator uses the
> API find_free_area from the e820 map to allocate some of its data
> structures.[ i.e the bitmap ] (The bootmem bitmap is used to track free
> & used pages with 1bit for 4K page. The reserve_bootmem() API is used to
> reserve)
> 
> The amount of memory required to represent the bitmap for node 0 with
> 48GB is. (48GB / (4K * 8)) = 1.5MB
> 
> The start address of the free area of size 1.5 MB returned by e820 map
> was
>>> bitmap starts at  PA (0xf9b000) size 1.5MB
> 0xf9b000 + 1.5 MB = 17.13MB
> 
> The bootmem bitmap used the 1.13MB section from the supposed crashkernel
> reserved area.
> Later when boot param parsing looks at the crashkernel=128M at 16M and
> reserves the area using the reserve_bootmem().
> 
> 
> Later when paging_init() is called the bootmem allocator is retired. At
> this point it free's the memory allocated to the bitmap & gives it to
> the system page allocator.
> i.e pages from 16MB to 17.13 MB are given to the system page allocator.
> (Even though the page is reserved by crashkernel.  ]
> 
> So pages in this memory range were given some system resources. When
> kexec loaded the kdump kernel in the 128M at 16M range it corrupted that
> memory & we saw the system crash.
> 
> I fixed the boot mem allocator and then things worked correctly.


Yes, this is a bug of bootmem allocator. Before switching to memblock,
the old bootmem allocator marks the crashkernel as exclusive, which
means it should use any memory area used by others, thus in this case
crashkernel memory reservation should fail.

> 
> 
> Ours is a 2.6.23 kernel.
> The later versions of the kernel have some other mechanism for early
> memory reservation (like early_res & memblock)
> 

Right, I think that version of kernel is still using the old bootmem 
allocator, so you can change the crashkernel reservation to be 
exclusively.




More information about the kexec mailing list