[PATCH] IA64: kexec allocates too few memory for kdump kernel itself

Jay Lan jlan at sgi.com
Fri Sep 12 20:38:09 EDT 2008


Simon Horman wrote:
>> Hi,
>>
>> It should be mem_phdr, got it from mem_ehdr->e_phdr.
>>
>>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
>>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
>>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
>>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
>> Does anyone understand how the array were created and why there
>> was a gap between i=0 and i=1 entries? I think this is the problem
>> but i do not know how to fix it, so tried to work around it.
>>
>> The statement my patch replaced was totally broken:
>>  -			if (loaded_segments[loaded_segments_num].end !=
>>  -				phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
>>  -				break;
>>  +			if (loaded_segments[loaded_segments_num].end <
>>  +			    (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
>>  +				loaded_segments[loaded_segments_num].end
>>  +				  = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
>>
>> My debugging showed that when "loaded_segments[loaded_segments_num].end"
>> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
>> and continue to next statement.  However, if i assign both expression
>> to local variables and do comparison, the 'break' statement is
>> executed correctly when two values are not the same. Unfortunately,
>> consequently the kdump kernel would _alawys_ hang.
>>
>> I believe the intent of the original statement is to ensure there is
>> no gap between entries of mem_phdr array. But if there is a gap,
>> kexec should simply exit with failure. The 'break' statement just
>> created a loaded_segment[] array that broke the kernel memory segment
>> into multiple entries and resulted in the kdump kernel hang in
>> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
>> some cases today are simply out of luck.
>>
>> I believe the real fix is to fix the contents of the mem_phdr array.
>> Since i do not know how to fix it, my patch would close up the
>> gap where there is the a gap between entries of the mem_phdr array.
>>
>> Does it make more sense to you now, Simon?
> 
> Hi Jay,
> 
> yes that does make sense. I'd like to poke around and see
> if mem_phdr can be fixed.

I think the whole ehdr is read from the kernel binary in
slurp_decompress_file.

Bernhard reported a kdump kernel boot problem caused by a patch
regarding per-cpu variables access in early boot code:
http://article.gmane.org/gmane.linux.ports.ia64/19380

I backed out the offending patch and i was no longer able to
reproduce this problem.

So, it is safe to say the problem was due to how we process
data from the vmlinuz.

The code i tried to change:
-             if (loaded_segments[loaded_segments_num].end !=
-                     phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
-                     break;
has two problems:
1) '!=' operation takes precedence over '&'. If the code is to
   do what it intends to do, the statement should be:
              if (loaded_segments[loaded_segments_num].end !=
                     (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
                     break;
2) When the 'break' is really executed, you breaks the kernel
   segment into multiple segments.

The code needs fix even if the problem i saw was a result of
a bug in the kernel.

Thanks,
jay



> 



More information about the kexec mailing list