More on kexec/purgatory handover

Eric W. Biederman ebiederm at xmission.com
Tue May 12 22:26:39 PDT 2015


Petr Tesarik <ptesarik at suse.cz> writes:

> Hi all,
>
> note that I'm not subscribed to the xen-devel mailing list, but Jan
> quoted from this mail of Andrew's in SUSE Bugzilla:
>
>> This is all from a while ago.  It is quite possible that we didn't
>> actually tested the compatibility case with a 64bit dom0 kernel,
>> although I certainly did test earlier versions of the series with a
>> 32bit dom0 kernel.  The work was done long before XenServer moved to a
>> 64bit dom0, and was done by deleting everything and starting from scratch.
>> 
>> IIRC, the low 640k mappings is a purgatory bug rather than Linux, and
>> has been fixed upstream in kexec-tools since.  (I recall that it used to
>> take a backup copy of the IVT for some reason)
>
> This is not entirely correct.
>
> Originally, kexec (in Linux kernel) was supposed to provide an
> environment which is equivalent to the boot loader, i.e. kexec is just
> another bootloader like LILO or GRUB. The first implementation indeed
> switched back to 16-bit real mode before passing control to the
> secondary kernel's boot code...
>
> It was at that time that the need arose to save the low 640K of RAM
> somewhere else, because the 16-bit bootloader had to use parts of this
> memory range, not the least because it also made BIOS calls, and BIOS
> used this range for its data.
>
> This solution was suboptimal for numerous reasons, e.g. very limited
> location of the purgatory code in physical RAM, or incompatibility with
> UEFI booting. As an improvement, a 32-bit boot protocol was introduced.
> At entry, the CPU must be in 32-bit protected mode with paging
> disabled. This explains why you never noticed any issues related to
> pagetables with 32-bit kernels. Since paging is disabled, there are
> none. ;-)
>
> The 32-bit protocol limits the location of the secondary kernel to low
> 4G in physical RAM (for obvious reason). This is now solved by a 64-bit
> boot protocol. Since paging must be always enabled in Long Mode, it
> must be set up somehow. The Linux documentation says: "The range with
> setup_header.init_size from start address of loaded kernel and zero
> page and command line buffer get ident mapping".
>
> The problematic part here is that Linux kexec code is split between
> kernel and purgatory. Unfortunately, the handover between the old
> kernel and the purgatory is not so well defined, so the actual kexec
> code is probably the best documentation available.
>
> There are currently two versions of the Linux purgatory: in kexec-tools
> and in the kernel. None of them sets CR3. On the other hand, the Linux
> kernel does set CR3 (see arch/x86/kernel/relocate_kernel_64.S). This
> makes me believe that the 64-bit kexec entry point expects that paging
> is set up by the old kernel. If Xen plays the role of the old kernel,
> it must also set up paging. The question is how.
>
> Let's start a discussion on the kexec mailing list (in Cc) to clarify
> what should be done by the old kernel and what should be done by the
> purgatory code.


Yes.  The assumption is that the for the addresses claimed by the image
that is loaded (think the physical addresses in ELF PHDRS).  That
physical addresses are one to one mapped with virtual addresses.
In practice I think I would up using huge pages and mapping everything
one-to-one on x86_64 because it was easier than a specific subset.

The low 640k was weird.   We copied it off in purgatory so that it could
be capture in a dump.  The linux kernel itself winds up using that
memory fundamentally because to fire up subsequent processors you have
to have memory in the low 640k as processors start in real-mode and
the startup IPI only takes an address in the low 1M.  I also remember
things with interrupt descriptor tables.

Eric






More information about the kexec mailing list