kexec failures with DEBUG_RODATA

Wed Jun 15 00:55:08 PDT 2016

Hi Russell,

On 14/06/2016:06:59:20 PM, Russell King - ARM Linux wrote:
> Guys,
> 
> Having added Keystone2 support to kexec, and asking TI to validate
> linux-next with mainline kexec-tools, I received two reports from
> them.
> 
> The first was a report of success, but was kexecing a 4.4 kernel
> from linux-next.
> 
> The second was a failure report, kexecing current linux-next from
> linux-next on this platform.  However, my local tests (using my
> 4.7-rc3 derived kernel) showed there to be no problem.
> 
> Building my 4.7-rc3 derived kernel with TI's configuration they
> were using with linux-next similarly failed.  So, it came down to
> a configuration difference.
> 
> After trialling several configurations, it turns out that the
> failure is, in part, caused by CONFIG_DEBUG_RODATA being enabled
> on TI's kernel but not mine.  Why should this make any difference?

> 
> Well, CONFIG_DEBUG_RODATA has the side effect that the kernel
> contains a lot of additional padding - we pad out to section size
> (1MB) the ELF sections with differing attributes.  This should not
> normally be a problem, except kexec contains this assumption:
> 
>                 /* Otherwise, assume the maximum kernel compression ratio
>                  * is 4, and just to be safe, place ramdisk after that */
>                 initrd_base = base + _ALIGN(len * 4, getpagesize());
> 
> Now, first things first.  Don't get misled by the comment - it's
> totally false.  That may be what's desired, but that is far from
> what actually happens in reality.
> 
> "base" is _not_ the address of the start of the kernel image, but
> is the base address of the start of the region that the kernel is
> to be loaded into - remember that the kernel is normally loaded
> 32k higher than the start of memory.  This 32k offset is _not_
> included in either "base" nor "len".  So, even if we did want to
> assume that there was a maximum compression ratio of 4, the above
> always calculates 32k short of that value.
> 
> The other invalid thing here is this whole "maximum kernel compression
> ratio" assumption.  Consider this non-DEBUG_RODATA kernel image:
> 
>    text    data     bss     dec     hex filename
> 6583513 2273816  215344 9072673  8a7021 ../build/ks2/vmlinux
> 
> This results in an image and zimage of:
> -rwxrwxr-x 1 rmk rmk 8871936 Jun 14 18:02 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk 4381592 Jun 14 18:02 ../build/ks2/arch/arm/boot/zImage
> 
> which is a ratio of about a 49%.  On entry to the decompressor, the
> compressed image will be relocated above the expected resulting
> kernel size.  So, let's say that it's relocated to 9MB.  This means
> the zImage will occupy around 9MB-14MB above the start of memory.
> Going by the 4x ratio, we place the other images at 16.7MB.  This
> leaves around 2.7MB free.  So that's probably fine... but think
> about this.  We assumed a ratio of 4x, but really we're in a rather
> tight squeeze - we actually have only about 50% of the compressed
> image size spare.
> 
> Now let's look at the DEBUG_RODATA case:
> 
>    text    data     bss     dec     hex filename
> 6585305 2273952  215344 9074601  8a77a9 ../build/ks2/vmlinux
> 
> And the resulting sizes:
> -rwxrwxr-x 1 rmk rmk 15024128 Jun 14 18:49 ../build/ks2/arch/arm/boot/Image
> -rwxrwxr-x 1 rmk rmk  4399040 Jun 14 18:49 ../build/ks2/arch/arm/boot/zImage
> 
> That's a compression ratio of about 29%.  Still within the 4x limit,
> but going through the same calculation above shows that we end up
> totally overflowing the available space this time.
> 
> That's exactly the same kernel configuration except for
> CONFIG_DEBUG_RODATA - enabling this has almost _doubled_ the
> decompressed image size without affecting the compressed size.
> 
> We've known for some time that this ratio of 4x doesn't work - we
> used to use the same assumption in the decompressor when self-
> relocating, and we found that there are images which achieve a
> better compression ratio and make this invalid.  Yet, the 4x thing
> has persisted in kexec code... and buggily too.
> 
> Since the kernel now has CONFIG_DEBUG_RODATA by default, this means
> that these kinds of ratio-based assumptions are even more invalid
> than they have been.
> 
> Right now, a zImage doesn't advertise the size of its uncompressed
> image, but I think with things like CONFIG_DEBUG_RODATA, we can no
> longer make assumptions like we have done in the past, and we need
> the zImage to provide this information so that the boot environment
> can be setup sanely by boot loaders/kexec rather than relying on
> broken heuristics like this.
> 
> Thoughts?

Sure, having a header information would be handy to do it. Other alternative
could be that we define "HAVE_LIBZ" and then we can have something like
kexec-Image-arm.c which handles plane Image.  We can also have something like
get_zlib_decompressed_length() which can give us exact length we need for kernel
and then we can place initrd accordingly in zImage_arm_load().

Other than that:

I see at least another issue clearly in ARM kernel code with CONFIG_DEBUG_RODATA
enabled.  When CONFIG_DEBUG_RODATA is enabled, we can not write text area.
kexec_start_address has been defined in relocate_kernel.S as a text area.
machine_kexec() writes at kexec_start_address with image->start. Similarly there
would be issues for overwriting of kexec_indirection_page, kexec_mach_type and
kexec_boot_atags. If arm mmu mapping configures text pages as RO, then we should
see an abort as soon as we do these writes.

I think there could be two alternatives to fix it (i) We pass all of above
values as function argument to soft_restart() till relocate_new_kernel().  (ii)
Remove kexec_start_address and others from relocate_new_kernel(). Now in
machine_kexec(), copy kexec_start_address and friends directly to
(reboot_code_buffer+relocate_new_kernel_size)

~Pratyush