More on kexec/purgatory handover
Petr Tesarik
ptesarik at suse.cz
Sat May 9 06:21:59 PDT 2015
Hi all,
note that I'm not subscribed to the xen-devel mailing list, but Jan
quoted from this mail of Andrew's in SUSE Bugzilla:
> This is all from a while ago. It is quite possible that we didn't
> actually tested the compatibility case with a 64bit dom0 kernel,
> although I certainly did test earlier versions of the series with a
> 32bit dom0 kernel. The work was done long before XenServer moved to a
> 64bit dom0, and was done by deleting everything and starting from scratch.
>
> IIRC, the low 640k mappings is a purgatory bug rather than Linux, and
> has been fixed upstream in kexec-tools since. (I recall that it used to
> take a backup copy of the IVT for some reason)
This is not entirely correct.
Originally, kexec (in Linux kernel) was supposed to provide an
environment which is equivalent to the boot loader, i.e. kexec is just
another bootloader like LILO or GRUB. The first implementation indeed
switched back to 16-bit real mode before passing control to the
secondary kernel's boot code...
It was at that time that the need arose to save the low 640K of RAM
somewhere else, because the 16-bit bootloader had to use parts of this
memory range, not the least because it also made BIOS calls, and BIOS
used this range for its data.
This solution was suboptimal for numerous reasons, e.g. very limited
location of the purgatory code in physical RAM, or incompatibility with
UEFI booting. As an improvement, a 32-bit boot protocol was introduced.
At entry, the CPU must be in 32-bit protected mode with paging
disabled. This explains why you never noticed any issues related to
pagetables with 32-bit kernels. Since paging is disabled, there are
none. ;-)
The 32-bit protocol limits the location of the secondary kernel to low
4G in physical RAM (for obvious reason). This is now solved by a 64-bit
boot protocol. Since paging must be always enabled in Long Mode, it
must be set up somehow. The Linux documentation says: "The range with
setup_header.init_size from start address of loaded kernel and zero
page and command line buffer get ident mapping".
The problematic part here is that Linux kexec code is split between
kernel and purgatory. Unfortunately, the handover between the old
kernel and the purgatory is not so well defined, so the actual kexec
code is probably the best documentation available.
There are currently two versions of the Linux purgatory: in kexec-tools
and in the kernel. None of them sets CR3. On the other hand, the Linux
kernel does set CR3 (see arch/x86/kernel/relocate_kernel_64.S). This
makes me believe that the 64-bit kexec entry point expects that paging
is set up by the old kernel. If Xen plays the role of the old kernel,
it must also set up paging. The question is how.
Let's start a discussion on the kexec mailing list (in Cc) to clarify
what should be done by the old kernel and what should be done by the
purgatory code.
Petr Tesarik
More information about the kexec
mailing list