More on kexec/purgatory handover
Eric W. Biederman
ebiederm at xmission.com
Wed May 13 00:35:36 PDT 2015
"Jan Beulich" <JBeulich at suse.com> writes:
>>>> On 13.05.15 at 07:26, <ebiederm at xmission.com> wrote:
>> The low 640k was weird. We copied it off in purgatory so that it could
>> be capture in a dump. The linux kernel itself winds up using that
>> memory fundamentally because to fire up subsequent processors you have
>> to have memory in the low 640k as processors start in real-mode and
>> the startup IPI only takes an address in the low 1M. I also remember
>> things with interrupt descriptor tables.
>
> Right, but the point to clarify is whether it is reasonable for the
> purgatory and/or new kernel to expect the old kernel to set up
> mappings for special regions like this one. As said before - I don't
> think this should be done; the old (possibly half broken) kernel
> shouldn't be forced to do any more than the absolute minimum
> amount of work to be able to transfer control.
When transfering control in 64bit mode on x86_64. A one-to-one virtual
to physical identity mapping must be set up. That identity map must be
set up before transfering control to the kexec destination. That
mapping must cover all pages in the destination image.
Those page tables should be created before the old kernel gets into a
broken state.
Fundamentally if you are transfering control in long mode you have to
set up some page table. I giant identity mapped page table that can use
1G or 2M pages takes up very little memory, and can be very simply
and easily before the transfer of control takes place.
All you have to do when you are in a half broken state is load cr3.
Possible after verifying a checksum.
640k in this case I don't think is particularly special, and certainly
not worth a special case. The in-kernel implementation on x86_64 sets
up a page table for all of memory which because of the availability of
huge pages winds up being simple and trivial.
Weird things like copying off the 640k region for the kexec-on-panic
case can be done in the adapter/purgatory piece that lives between the
two kernels.
So at a very practical level I think we shouldn't have mappings for
special regions we should just have mappings for all of memory.
KISS.
Eric
More information about the kexec
mailing list