[Xen-devel] Xen kexec status

David Woodhouse dwmw2 at infradead.org
Sat Apr 27 11:30:01 PDT 2019

> On 27/04/2019 07:15, David Woodhouse wrote:
>> I've been looking at kexec into Xen, and from Xen.
>> Kexec-tools doesn't support Multiboot v2, and doesn't treat the Xen
>> image as relocatable. So it loads it at address zero, which causes lots
>> of amusement:
> Which binary are you trying to load?
> xen-syms gets converted into an elf32 which should be linked to run a
> 2M.  See mkelf32 and XEN_IMG_OFFSET

Yeah, looking closer it isn't Xen that's being loaded at zero...

>> Firstly, head.S trusts the low memory limit found in the BDA, which has
>> been scribbled on. Hacking around that and setting no-real-mode does
>> make kexec into Xen from Linux work.
> Do we know what scribbles on it?

... kexec in its wisdom is choosing to put something small (0xfe bytes or
so) there. Probably the multiboot info.

Telling it --mem-min=0x1000 should suffice.

> For better or worse, the IVT needs to remain valid wherever possible to
> reduce the number of corner cases where an errant NMI/#MC will take out
> the entire system.
>> Secondly, kexec (in xen_kexec_load()) adds a mapping of the 0-1MiB
>> region, which "overlaps" with where Xen is actually loaded, so *Xen*
>> refuses the kexec_load hypercall.
> ISTR this being necessary for purgatory to function at the time David
> did the kexec work, but really it seems like a bug with the
> configuration of purgatory.

I have fixed this to fill in between any gaps below 1MiB but if we can
ditch it, all the better.

>> For kexec from Xen I also reverted to kexec-tools 2.0.16 as commit
>> 894bea9335f57b62c ("kexec-tools: Perform run-time linking of
>> libxenctrl.so") seems to have broken things by not always defining
>> HAVE_LIBXENCTRL when it should. I'll fix that shortly.

That one appears to have been transient; it works now.

>> Most of the above is relatively simply worked around by hacking the Xen
>> image to be ET_DYN (so that kexec will relocate it) and then using
>> kexec --mem-min=0x100000. I'll probably implement Multiboot v2 support
>> in kexec-tools to allow for saner relocation.
> I think having MB2 support would be a very good move.  It also provides
> a better way to pass the UEFI details.
>> We should fix head.S. One option is to recognise when the load address
>> is zero, and automatically eschew the BDA and trigger the no-real-mode
>> behaviour when that is the case. Better suggestions welcome.
>> Should we also avoid having a load segment at offset zero in the image,
>> so that it doesn't scribble on the BDA by default?
> I don't think we should ever be loading a binary at 0, but it might be
> worth having a dedicated kexec entry point which can be more selective
> about what it does.

Compare the MB bootloader name with "kexec"? I note we already do that for

> The EFI and PVH entrypoints already set skip_realmode amongst other
> things.
> Another option might be to only use the BDA/EBDA in the absence of any
> memory map information.

The code seems to be fairly insistently not trusting the MB information. I
assumed there were reasons for that... but perhaps we could trust MB2?

>> Should we also fix Xen's kexec_load not to refuse overlapping segments
>> if they are not loaded (bufsz==0)? I'm not quite sure what's going on
>> there; doesn't this happen with paging disabled anyway, so why would we
>> need an explicit mapping of RAM?
> Do you have a dump of which segments are attempting to be loaded?  TBH,
> this sounds like fallout from the earlier issues, but it is also
> possible that we've got a bug in the overlap checks.

Indeed, the overlap is real and it's due to the earlier issues.

>> After that, I'm looking at using Xen as a crash kernel, which means I
>> really don't want it scribbling on low memory that it hasn't been
>> explicitly told it can use. First attempt at this is at
>> http://david.woodhou.se/0001-x86-boot-Use-trampoline_protmode_entry-in-place.patch
>> but as noted there, it only works for a single processor for now; I'll
>> fix it as described therein.
> I think it is well past time to (re)consider and strip down the early
> assembly code.  There are a number of at-best-questionable things, and
> it is extremely thick going.  (TBH, I'd also like to replace most of it
> with C, but doing that will first require understanding how it actually
> all works.)

I am... some way to understanding how it works. My current plan is to let
head.S relocate *only* the one-time temporary 16-bit boot code, and only
if !no-real-mode (where I may yet set no-real-mode automatically for a
kexec boot).

Then the wake-up and AP trampolines which need to live below 1GiB can be
put there later from __setup_xen() once it knows for real which memory it
can use.

The 32-bit code can run from the Xen physical image at boot time on the
BSP, but that isn't mapped for the AP so it's probably easier to let that
get copied to low memory with the permanent 16-bit code. Just means I have
to re-relocate it but that's simple enough. I can re-unify the already
divergent EFI_LOADER code path here too.

> Currently, the main Xen image strictly needs to be located below the 4G
> boundary, so you are right that none of the 32bit code actually needs to
> be in the trampoline.  In principle it would be nice to lift this
> restriction, at which point we need all the code required to get into
> long mode in the trampoline.

Yeah, that shouldn't be hard. Kind of glad it doesn't yet work and I can
say "later" though :)


More information about the kexec mailing list