[PATCH] kexec based hibernation: a prototype of kexec multi-stage load

Huang, Ying ying.huang at intel.com
Tue May 13 21:57:46 EDT 2008

Hi, Vivek,

On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote:
> On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote:
> > This patch implements a prototype of kexec multi-stage load. With this
> > patch, the "backup pages map" can be passed to kexeced kernel via
> > /sbin/kexec; and the sys_kexec_load can be used to load large
> > hibernated image with huge number of segments.
> > 
> > 
> Hi Huang,
> Had a quick look at the patch. Will review in detail soon. Had few
> thoughts.
> In general, these patches are on top of previous kexec jump patches.
> It would be good if you could repost your updated patches so that
> I can apply the patches and and get some testing going.

The kexec jump patch v9 is sufficient for this patch to work. I have no
new version of kexec jump patch so far.

> Last time I tried the patches (V9) and kexec jump did not work for me. I
> was not getting timer interrupts in second kernel. Then I had to put 
> LAPIC and IOAPIC in legacy mode and then at one way jump started working.
> I am not sure how the next kernel boots for you without putting APICs
> in legacy mode. (Yet to make returning back to original kernel work
> using V9). 

Can normal kexec (without kexec jump) works without putting LAPIC and
IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
into legacy mode before kexec and restore them after?

The kexec jump patch works well on my IBM T42. But it seems that the
IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this

> > In kexec based hibernation, resuming from disk is implemented as
> > loading the hibernated disk image with sys_kexec_load(). But unlike
> > the normal kexec load, the hibernated image may have huge number of
> > segments. So multi-stage loading is necessary for kexec load based
> > resuming from disk implementation.
> I understand that hibernated images are huge. But why do we require
> multi stage loading? I knew there was a maximum segment limit in kexec.
> But I think we can change that limit. Anything else prevents us from
> loading large images in one go?

There are two reason for multi-stage loading:

- Pass backup pages map from original kernel (A) to kexeced kernel (B),
because it is not known before loading. We have discussed this before

- Load large hibernated image. The hibernated image can be not only
large but also discontinuous. For example, the physical memory size is
4G, and there is one free page every 2 pages, that is, there will be
nearly 2G segments. Loading these segments in one go is impossible. So
multi-stage load is necessary. And if the hibernated image is
compressed, it is also very difficult to load it in one go because the
anonymous pages needed.

> > And, multi-stage loading is also
> > necessary for parameter passing from original kernel to kexeced kernel
> > because some information such as "backup pages map" is not available
> > before loading.
> > 
> > 
> > Four stages are defined:
> > 
> > - KS_start: start stage; begin a new kexec loading; there must be only
> >   one KS_start stage in one kexec loading.
> > 
> > - KS_mid: middle stage; continue load some segments; there may be many
> >   or zero KS_mid stages in one kexec loading; follows a KS_start or
> >   KS_mid stage.
> > 
> > - KS_final: final stage; finish a kexec loading; there must be only
> >   one KS_final stage in one kexec loading; follows a KS_start or
> >   KS_mid stage.
> > 
> > - KS_full: back compatible with original loading semantics, finish all
> >   work of a kexec loading in one KS_full stage.
> > 
> > 
> > Overlapping between pages of different segments is allowed to support
> > "parameter passing".
> > 
> > 
> > During loading, a hash table mapped from destination page to source
> > page is used instead of original linear mapping
> > implementation. Because the hibernated image may be very large (up to
> > near the size of physical memory), it is very time-consuming to search
> > a source page given the destination page, which is used to check
> > whether an newly allocated page is in the range of allocated
> > destination pages.
> This seems to be an optimization of kexec so that it becomes efficient
> in loading large images (containing large number of segments). Probably
> this can be a separate patch.

If it is desired, I can separate it into another patch.

> IMHO, we can just first write a minimal patch where one can just switch
> between kernels. Once that patch is upstream, we can enhance
> it to do the hibernation and saving core functionality. Incremental
> review becomes easier. Your last patch (v9) was a good attempt at that and
> I thought very soon we shall have something mergable.

Agreed. We can first focus on kexec jump patch. But as in last thread of
kexec jump (v9), we need a protocol for parameter passing between kernel
A and kernel B. So, we can use this patch as a prototype for the
communication protocol.

> > The original mapping is only used by assembly code
> > to swap the page contents. This map is also exported to user space via
> > /proc/kexec_pgmap, so that /sbin/kexec can use it to construct the
> > "backup pages map" parameter for kexeced kernel.
> > 
> > 
> > This patch is based on Linux kernel 2.6.25 and kexec_jump patch, and
> > has been tested on an IBM T42.
> > 
> Is kexec_jump v9 patch good enough or you have anohter internal version
> of patch on top of this patch applies?

v9 is the latest kexec jump patch, no other internal version so far.

Best Regards,
Huang Ying

More information about the kexec mailing list