[PATCH -mm] kexec jump -v9

Wed Mar 12 15:37:40 EDT 2008

On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> "Huang, Ying" <ying.huang at intel.com> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 

I think he needs to pass on much more data than just return address. 

IIUC, he needs to pass backup pages map to new kernel, so that any
user space tool can use backup pages map to reconstruct/rearrange the
first kernel's memory core and tools like makedumpfile can do filtering
before hibernated images is saved.

This brings me to a random thought. Can we break the process of loading
a hibernation kernel in two steps.

- In first step just do the memory reservation for running second kernel.
  (kexec -l <dummpy-file-for-reserving-memory>)

- This memory map of reserved pages is exported to user space.

- Use this memory map and regenerate the hibernation kernel initrd
  (rootfs.gz) and put the memory map there. This memory map can be used
  by makedumpfile in second kernel for filtering.

This way it will user space to user space communication of information 
which gets fixed at kernel loading time.

> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.
> 

Agreed. Without a proper protocol, we will often run into issues that
X version of kernel does not work with Y version of hibernation kernel
etc.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.
> 

That makes sense. Just that we shall have to put some kind of ELF NOTE
or some other identifier in resumable core file to identify it.

Thanks
Vivek