[PATCH] IA64: kexec allocates too few memory for kdump kernel itself
Simon Horman
horms at verge.net.au
Mon Sep 15 01:47:03 EDT 2008
On Fri, Sep 12, 2008 at 05:38:09PM -0700, Jay Lan wrote:
> Simon Horman wrote:
> >> Hi,
> >>
> >> It should be mem_phdr, got it from mem_ehdr->e_phdr.
> >>
> >>> i=0, p_paddr=3018000000, p_memsz=d04480, p_offset=10000, p_type=1
> >>> i=1, p_paddr=3018d20000, p_memsz=9620, p_offset=d20000, p_type=1
> >>> i=2, p_paddr=3018d30000, p_memsz=564490, p_offset=d30000, p_type=1
> >>> i=3, p_paddr=0, p_memsz=0, p_offset=0, p_type=4
> >> Does anyone understand how the array were created and why there
> >> was a gap between i=0 and i=1 entries? I think this is the problem
> >> but i do not know how to fix it, so tried to work around it.
> >>
> >> The statement my patch replaced was totally broken:
> >> - if (loaded_segments[loaded_segments_num].end !=
> >> - phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> >> - break;
> >> + if (loaded_segments[loaded_segments_num].end <
> >> + (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
> >> + loaded_segments[loaded_segments_num].end
> >> + = phdr->p_paddr & ~(ELF_PAGE_SIZE-1);
> >>
> >> My debugging showed that when "loaded_segments[loaded_segments_num].end"
> >> != "phdr->p_paddr & ~(ELF_PAGE_SIZE-1)", they were treated as equal
> >> and continue to next statement. However, if i assign both expression
> >> to local variables and do comparison, the 'break' statement is
> >> executed correctly when two values are not the same. Unfortunately,
> >> consequently the kdump kernel would _alawys_ hang.
> >>
> >> I believe the intent of the original statement is to ensure there is
> >> no gap between entries of mem_phdr array. But if there is a gap,
> >> kexec should simply exit with failure. The 'break' statement just
> >> created a loaded_segment[] array that broke the kernel memory segment
> >> into multiple entries and resulted in the kdump kernel hang in
> >> find_memory(). The IA64 (at least 2.6.27-rc4) kdump kernel works in
> >> some cases today are simply out of luck.
> >>
> >> I believe the real fix is to fix the contents of the mem_phdr array.
> >> Since i do not know how to fix it, my patch would close up the
> >> gap where there is the a gap between entries of the mem_phdr array.
> >>
> >> Does it make more sense to you now, Simon?
> >
> > Hi Jay,
> >
> > yes that does make sense. I'd like to poke around and see
> > if mem_phdr can be fixed.
>
> I think the whole ehdr is read from the kernel binary in
> slurp_decompress_file.
>
> Bernhard reported a kdump kernel boot problem caused by a patch
> regarding per-cpu variables access in early boot code:
> http://article.gmane.org/gmane.linux.ports.ia64/19380
>
> I backed out the offending patch and i was no longer able to
> reproduce this problem.
>
> So, it is safe to say the problem was due to how we process
> data from the vmlinuz.
>
> The code i tried to change:
> - if (loaded_segments[loaded_segments_num].end !=
> - phdr->p_paddr & ~(ELF_PAGE_SIZE-1))
> - break;
> has two problems:
> 1) '!=' operation takes precedence over '&'. If the code is to
> do what it intends to do, the statement should be:
> if (loaded_segments[loaded_segments_num].end !=
> (phdr->p_paddr & ~(ELF_PAGE_SIZE-1)) )
> break;
Good point. That is definately a problem.
> 2) When the 'break' is really executed, you breaks the kernel
> segment into multiple segments.
I also agree that is a problem. I just wonder what is the
best thing to do about it.
> The code needs fix even if the problem i saw was a result of
> a bug in the kernel.
Agreed.
For now can you make a patch that just fixes precedence logic bug?
Lets merge that and then tackle the more complex broken segment issue.
--
Simon Horman
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
More information about the kexec
mailing list