kdump failed because of hotplug memory adding in kdump kernel

Thu Jan 9 16:56:23 EST 2014

On Thu, 2014-01-09 at 16:27 -0500, Vivek Goyal wrote:
> On Thu, Jan 09, 2014 at 11:34:30AM -0700, Toshi Kani wrote:
> > On Thu, 2014-01-09 at 13:23 -0500, Vivek Goyal wrote:
> > > On Thu, Jan 09, 2014 at 10:24:25AM -0700, Toshi Kani wrote:
> > > 
> > > [..]
> > > > > I think creating a new command line option is simpler as compared to
> > > > > creating a new flag in bootparam which in turn disables memory hotplug.
> > > > > More users can use that option. For example, if for some reason hotplug
> > > > > code is crashing, one can just disable it on command line as work around
> > > > > and move on.
> > > > 
> > > > I do not have a strong opinion about having such option.  However, I
> > > > think it is more user friendly to keep the exactmap option works alone
> > > > on any platforms.
> > > 
> > > I think we should create internally a variable which will disable memory
> > > hotplug. And set that variable based on memmap=exactmap, mem=X and also
> > > provide a way to disable memory hotplug directly using command line
> > > option.
> > > 
> > > Current kexec-tools can use memmap=exactmap and be happy. I am writing
> > > a new kexec syscall and will not be using memmap=exactmap and would need
> > > to use that command line option to disable memory hotplug behavior.
> > 
> > Sounds good to me.
> 
> Nobody responded to my other question, so I would ask it again.
> 
> Assume we have disabled hotplug memory in second kernel. First kernel
> saw hotplug memory and assume crash kernel reserved region came from
> there. We will pass this memory in bootparams to second kernel and it
> will show up in E820 map. It should still be accessible in second kernel,
> is that right?

Yes.

> Or there is some dependency on ACPI doing some magic before this memory
> range is available in second kernel?

No.  The 1st kernel reserves the crash kernel region, which cannot be
hot-deleted.  So, this region continues to be accessible by the 2nd
kernel without any operation.

I am more curious to know how makedumpfile decides what memory ranges to
dump.  The 1st kernel may have performed memory hot-add / delete
operations before a crash, so it needs to know the valid physical
address range at the time of crash, and may not rely on the E820 map
from BIOS (which is stale).  Am I right to assume that makedumpfile gets
it from the page tables of the 1st kernel?

Thanks,
-Toshi