kexec cannot find text map area if kaslr is enabled

Eric W. Biederman ebiederm at xmission.com
Thu Oct 17 15:58:30 EDT 2013


HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> writes:

> Hello,
>
> I tried to use x86/kaslr branch to check if how it works with kdump
> framework.

As far as I can tell x86/kaslr is a pretty silly idea.  There don't seem
to be enough bits to make it hard to brute force, much less hard to
guess.  And it is a lot of pain to get there... Sigh.

> I found kexec doesn't work. According to the message, it looks like kexec failing
> to find kernel text map area from kcore.

Well kexec -p doesn't work. 

> $ sudo /sbin/kexec  -p --command-line="ro root=UUID=cdd5e357-d223-47ee-9d6e-d1fa78b3f8a4 rd_NO_LUKS nodmraid rd_NO_MD KEYBOARDTYPE=pc KEYTABLE=jp106 LANG=ja_JP.UTF-8 rd_NO_LVM rd_NO_DM consol\
> e=ttyS0,19200n8r trace_event=block:*,irq:*,mce:*,sched:*,signal:*,workqueue:*,scsi:* trace_buf_size=25165824 irqpoll nr_cpus=2 reset_devices cgroup_disable=memory mce=off enable_lazy_purge " --initrd=/boot/initrd-3.12.0-rc4-k\
> aslrkdump.img /boot/vmlinuz-3.12.0-rc4-kaslr
> Can't find kernel text map area from kcore
> Cannot load /boot/vmlinuz-3.12.0-rc4-kaslr
>
> From source code, it looks like kexec trying to find text map area by hard-coded
> __START_KERNEL_map address. But this is being altered by kaslr.

Looking at the code you have found the hard coded address of -2G is
fine, and actually required by the compiler.   The actual problem
appears to be that the structure of the kernel mapping has changed.
There are now two mappings in the -2GB range.  one of 10MiB and one
of 1024MiB.  Where the code was looking for a mapping of 512MiB.

The entire bit of code is a just for pretty printing the core and I
suspect could be done more robustly, possibly by reporting all of the
kernel vaddrs of the mappings.

I expect you could increase X86_64_KERNEL_TEXT_SIZE 2GiB -1 aka
0x7fffffff and the code would work.  I don't know if you would have a
recognizable text segment in the core dump.

I believe ultimately what we want is to have an elf image with all of
the same PT_LOAD segments as /proc/kcore, and the current implementation
is not general enough to do that.  So this probably makes a good
opportunity to rewrite it.

It may also make sense to have some information from /proc/kallsyms.  We
aren't doing that on i386 and have something that works, so I suspect
the same logic will work on x86_64.  At least until it is decided that
the best way to load the kernel is to randomly reorder and relink all of
the .o's in the kernel at boot time.

Eric

> static int get_kernel_vaddr_and_size(struct kexec_info *UNUSED(info),
>                                      struct crash_elf_info *elf_info)
> <cut>
>         /* Traverse through the Elf headers and find the region where
>          * kernel is mapped. */
>         end_phdr = &ehdr.e_phdr[ehdr.e_phnum];
>         for(phdr = ehdr.e_phdr; phdr != end_phdr; phdr++) {
>                 if (phdr->p_type == PT_LOAD) {
>                         unsigned long long saddr = phdr->p_vaddr;
>                         unsigned long long eaddr = phdr->p_vaddr + phdr->p_memsz;
>                         unsigned long long size;
>
>                         /* Look for kernel text mapping header. */
>                         if ((saddr >= X86_64__START_KERNEL_map) &&
>                             (eaddr <= X86_64__START_KERNEL_map + X86_64_KERNEL_TEXT_SIZE)) {
>                                 saddr = _ALIGN_DOWN(saddr, X86_64_KERN_VADDR_ALIGN);
>                                 elf_info->kern_vaddr_start = saddr;
>                                 size = eaddr - saddr;
>                                 /* Align size to page size boundary. */
>                                 size = _ALIGN(size, align);
>                                 elf_info->kern_size = size;
>                                 dbgprintf("kernel vaddr = 0x%llx size = 0x%llx\n",
>                                         saddr, size);
>                                 return 0;
>                         }
>                 }
>         }
>         fprintf(stderr, "Can't find kernel text map area from kcore\n");
>         return -1;
>
> It seems to me that kexec needs to get runtime relocation information for example
> from /proc/kallsyms.
>
> I think there would be other part that doesn't work well due to this kind of hard coded address.
>
> FYI, here are also part of /proc/iomem and /proc/kcore information on my environment:
>
> $ readelf -l /proc/kcore
> Elf file type is CORE (Core file)
> Entry point 0x0
> There are 11 program headers, starting at offset 64
>
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   NOTE           0x00000000000002a8 0x0000000000000000 0x0000000000000000
>                  0x0000000000000c74 0x0000000000000000         0
>   LOAD           0x00007fffff601000 0xffffffffff600000 0x0000000000000000
>                  0x0000000000800000 0x0000000000800000  RWE    1000
>   LOAD           0x00007fffa3001000 0xffffffffa3000000 0x0000000000000000
>                  0x0000000000ed4000 0x0000000000ed4000  RWE    1000
>   LOAD           0x0000490000001000 0xffffc90000000000 0x0000000000000000
>                  0x00001fffffffffff 0x00001fffffffffff  RWE    1000
>   LOAD           0x00007fffc0001000 0xffffffffc0000000 0x0000000000000000
>                  0x000000003f000000 0x000000003f000000  RWE    1000
>   LOAD           0x0000080000002000 0xffff880000001000 0x0000000000000000
>                  0x000000000009a000 0x000000000009a000  RWE    1000
>   LOAD           0x00006a0000001000 0xffffea0000000000 0x0000000000000000
>                  0x0000000000003000 0x0000000000003000  RWE    1000
>   LOAD           0x0000080000101000 0xffff880000100000 0x0000000000000000
>                  0x000000007af0d000 0x000000007af0d000  RWE    1000
>   LOAD           0x00006a0000004000 0xffffea0000003000 0x0000000000000000
>                  0x0000000001ae6000 0x0000000001ae6000  RWE    1000
>   LOAD           0x0000080100001000 0xffff880100000000 0x0000000000000000
>                  0x0000000780000000 0x0000000780000000  RWE    1000
>   LOAD           0x00006a0003801000 0xffffea0003800000 0x0000000000000000
>                  0x000000001a400000 0x000000001a400000  RWE    1000
>
> 00000000-00000fff : reserved
> 00001000-0009afff : System RAM
> 0009b000-0009ffff : reserved
> 000a0000-000bffff : PCI Bus 0000:00
> 000c0000-000c7fff : Video ROM
> 000c8000-000c8fff : Adapter ROM
> 000c9000-000cefff : Adapter ROM
> 000e0000-000fffff : reserved
>   000f0000-000fffff : System ROM
> 00100000-7b00cfff : System RAM
>   03000000-22ffffff : Crash kernel
>   23000000-2355118e : Kernel code
>   2355118f-23af95ff : Kernel data
>   23cb2000-23eadfff : Kernel bss
> 7b00d000-7b00ffff : reserved
> 7b010000-7b65efff : ACPI Non-volatile Storage
> 7b65f000-7b681fff : ACPI Tables
> 7b682000-7b7bffff : reserved
> 7b7c0000-7ba3ffff : ACPI Non-volatile Storage
> 7ba40000-7baaafff : reserved
> 7baab000-7bcfffff : ACPI Tables
> 7bd00000-7bd12fff : reserved
> 7bd13000-7bd15fff : ACPI Tables
> 7bd16000-7bd45fff : reserved
> 7bd46000-7bd5efff : ACPI Tables
> 7bd5f000-7bdfefff : reserved
> 7bdff000-7bdfffff : ACPI Tables
> 7be00000-7be4efff : reserved
>   7be1b018-7be1b067 : APEI ERST
>   7be1b070-7be1b077 : APEI ERST
>   7be1b078-7be1d017 : APEI ERST
> 7be4f000-7bf83fff : ACPI Tables
> 7bf84000-7bfcefff : ACPI Non-volatile Storage
> 7bfcf000-7bffefff : ACPI Tables
> 7bfff000-8fffffff : reserved
>   80000000-8fffffff : PCI MMCONFIG 0000 [bus 00-ff]
> 90000000-afffffff : PCI Bus 0000:00
> <cut>



More information about the kexec mailing list