[Query] PAGE_OFFSET on KASLR enabled ARM64 kernel

Bhupesh Sharma bhsharma at redhat.com
Fri Jun 1 14:52:46 PDT 2018


On Sat, Jun 2, 2018 at 3:11 AM, Bhupesh Sharma <bhsharma at redhat.com> wrote:
> On 05/31/2018 10:21 AM, Bhupesh Sharma wrote:
>>
>> Hi Ard,
>>
>> Sorry I was out for most of the day yesterday. Please see my responses
>> inline.
>>
>> On Mon, May 28, 2018 at 12:16 PM, Ard Biesheuvel
>> <ard.biesheuvel at linaro.org> wrote:
>>>
>>> On 27 May 2018 at 23:03, Bhupesh Sharma <bhsharma at redhat.com> wrote:
>>>>
>>>> Hi ARM64 maintainers,
>>>>
>>>> I am confused about the PAGE_OFFSET value (or the start of the linear
>>>> map) on a KASLR enabled ARM64 kernel that I am seeing on a board which
>>>> supports a compatible EFI firmware (with EFI_RNG_PROTOCOL support).
>>>>
>>>> 1. 'arch/arm64/include/asm/memory.h' defines PAGE_OFFSET as:
>>>>
>>>> /*
>>>>   * PAGE_OFFSET - the virtual address of the start of the linear map
>>>> (top
>>>>   *         (VA_BITS - 1))
>>>>   */
>>>> #define PAGE_OFFSET        (UL(0xffffffffffffffff) - \
>>>>      (UL(1) << (VA_BITS - 1)) + 1)
>>>>
>>>> So for example on a platform with VA_BITS=48, we have:
>>>> PAGE_OFFSET = 0xffff800000000000
>>>>
>>>> 2. However, for the KASLR case, we set the 'memstart_offset_seed ' to
>>>> use the 16-bits of the 'kaslr-seed' to randomize the linear region in
>>>> 'arch/arm64/kernel/kaslr.c' :
>>>>
>>>> u64 __init kaslr_early_init(u64 dt_phys)
>>>> {
>>>> <snip..>
>>>>      /* use the top 16 bits to randomize the linear region */
>>>>      memstart_offset_seed = seed >> 48;
>>>> <snip..>
>>>> }
>>>>
>>>> 3. Now, we use the 'memstart_offset_seed' value to randomize the
>>>> 'memstart_addr' value in 'arch/arm64/mm/init.c':
>>>>
>>>> void __init arm64_memblock_init(void)
>>>> {
>>>> <snip..>
>>>>
>>>>      if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
>>>>          extern u16 memstart_offset_seed;
>>>>          u64 range = linear_region_size -
>>>>                  (memblock_end_of_DRAM() - memblock_start_of_DRAM());
>>>>
>>>>          /*
>>>>           * If the size of the linear region exceeds, by a sufficient
>>>>           * margin, the size of the region that the available physical
>>>>           * memory spans, randomize the linear region as well.
>>>>           */
>>>>          if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN)
>>>> {
>>>>              range = range / ARM64_MEMSTART_ALIGN + 1;
>>>>              memstart_addr -= ARM64_MEMSTART_ALIGN *
>>>>                       ((range * memstart_offset_seed) >> 16);
>>>>          }
>>>>      }
>>>> <snip..>
>>>> }
>>>>
>>>> 4. Since 'memstart_addr' indicates the start of physical RAM, we
>>>> randomize the same on basis of 'memstart_offset_seed' value above.
>>>> Also the 'memstart_addr' value is available in '/proc/kallsyms' and
>>>> hence can be accessed by user-space applications to read the
>>>> 'memstart_addr' value.
>>>>
>>>> 5. Now since the PAGE_OFFSET value is also used by several user space
>>>> tools (for e.g. makedumpfile tool uses the same to determine the start
>>>> of linear region and hence to read PT_NOTE fields from /proc/kcore), I
>>>> am not sure how to read the randomized value of the same in the KASLR
>>>> enabled case.
>>>>
>>>> 6. Reading the code further and adding some debug prints, it seems the
>>>> 'memblock_start_of_DRAM()' value is more closer to the actual start of
>>>> linear region rather than 'memstart_addr' and 'PAGE_OFFSET" in case of
>>>> KASLR enabled kernel:
>>>>
>>>> [root at qualcomm-amberwing] # dmesg | grep -i "arm64_memblock_init" -A 5
>>>>
>>>> [    0.000000] inside arm64_memblock_init, memstart_addr =
>>>> ffff976a00000000,
>>>> linearstart_addr = ffffe89600200000, memblock_start_of_DRAM =
>>>> ffffe89600200000,
>>>> PHYS_OFFSET = ffff976a00000000, PAGE_OFFSET = ffff800000000000,
>>>> KIMAGE_VADDR = ffff000008000000, kimage_vaddr = ffff20c2d7800000
>>>>
>>>> [root at qualcomm-amberwing] # dmesg | grep -i "Virtual kernel memory
>>>> layout" -A 15
>>>> [    0.000000] Virtual kernel memory layout:
>>>> [    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
>>>> (   128 MB)
>>>> [    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
>>>> (126847 GB)
>>>> [    0.000000]       .text : 0xffff20c2d7880000 - 0xffff20c2d8040000
>>>> (  7936 KB)
>>>> [    0.000000]     .rodata : 0xffff20c2d8040000 - 0xffff20c2d83a0000
>>>> (  3456 KB)
>>>> [    0.000000]       .init : 0xffff20c2d83a0000 - 0xffff20c2d8750000
>>>> (  3776 KB)
>>>> [    0.000000]       .data : 0xffff20c2d8750000 - 0xffff20c2d891b200
>>>> (  1837 KB)
>>>> [    0.000000]        .bss : 0xffff20c2d891b200 - 0xffff20c2d90a5198
>>>> (  7720 KB)
>>>> [    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
>>>> (  4544 KB)
>>>> [    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
>>>> (    16 MB)
>>>> [    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
>>>> (   128 GB maximum)
>>>> [    0.000000]               0xffff7ffa25800800 - 0xffff7ffa2b800000
>>>> (    95 MB actual)
>>>> [    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
>>>> ( 98302 MB)
>>>>
>>>> As one can see above, the 'memblock_start_of_DRAM()' value of
>>>> 0xffffe89600200000 represents the start of linear region:
>>>>
>>>> [    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
>>>> ( 98302 MB)
>>>>
>>>> So, my question is to access the start of linear region (which was
>>>> earlier determinable via PAGE_OFFSET macro), whether I should:
>>>>
>>>> - do some back-computation for the start of linear region from the
>>>> 'memstart_addr' in user-space, or
>>>> - use a new global variable in kernel which is assigned the value of
>>>> memblock_start_of_DRAM()' and assign it to '/proc/kallsyms', so that
>>>> it can be read by user-space tools, or
>>>> - whether we should rather look at removing the PAGE_OFFSET usage from
>>>> the kernel and replace it with a global variable instead which is
>>>> properly updated for KASLR case as well.
>>>>
>>>> Kindly share your opinions on what can be a suitable solution in this
>>>> case.
>>>>
>>>> Thanks for your help.
>>>>
>>>
>>> Hello Bhupesh,
>>>
>>> Could you explain what the relevance is of PAGE_OFFSET to userland?
>>> The only thing that should matter is where the actual linear mapping
>>> of DRAM is, and I am not sure I understand why we care about where it
>>> resides relative to the base of the linear region.
>>
>>
>> Actually certain user-space tools like makedumpfile (which is used to
>> generate and compress the vmcore) and crash-utility (which is used to
>> debug the vmcore), rely on the PAGE_OFFSET value (which denotes the
>> base of the linear map region) to determine virtual to physical
>> mapping of the addresses lying in the linear region .
>>
>> One specific use case that I am working on at the moment is the
>> makedumpfile '--mem-usage', which allows one to see the page numbers
>> of current system (1st kernel) in different use (please see
>> MAKEDUMPFILE(8) for more details).
>>
>> Using this we can know how many pages are dumpable when different
>> dump_level is specified when invoking the makedumpfile.
>>
>> Normally, makedumpfile analyses the contents of '/proc/kcore' (while
>> excluding the crashkernel range), and then calculates the page number
>> of different kind per vmcoreinfo.
>>
>> For e.g. here is an output from my arm64 board (a non KASLR boot):
>>
>>      TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
>>
>> ----------------------------------------------------------------------
>>      ZERO            49524                   yes             Pages
>> filled with zero
>>      NON_PRI_CACHE   15143                   yes             Cache
>> pages without private flag
>>      PRI_CACHE       29147                   yes             Cache
>> pages with private flag
>>      USER            3684                    yes             User process
>> pages
>>      FREE            1450569                 yes             Free pages
>>      KERN_DATA       14243                   no              Dumpable
>> kernel data
>>
>>      page size:              65536
>>      Total pages on system:  1562310
>>      Total size on system:   102387548160     Byte
>>
>> This use case requires directly reading the '/proc/kcore' and the
>> hence the PAGE_OFFSET value is used to determine the base address of
>> the linear region, whose value is not static in case of KASLR boot.
>>
>> Another use-case is where the crash-utility uses the PAGE_OFFSET value
>> to perform a virtual-to-physical conversion for the address lying in
>> the linear region:
>>
>> ulong
>> arm64_VTOP(ulong addr)
>> {
>>      if (machdep->flags & NEW_VMEMMAP) {
>>          if (addr >= machdep->machspec->page_offset)
>>              return machdep->machspec->phys_offset
>>                  + (addr - machdep->machspec->page_offset);
>>
>> <..snip..>
>> }
>>
>
> Another confusing concept is the rounded-up value of 'memstart_addr' in
> 'arch/arm64/mm/init.c' when booting a non-KASLR_ kernel and when the value
> of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN:
>
> void __init arm64_memblock_init(void)
> {
>
> <..snip..>
>         /*
>          * Select a suitable value for the base of physical memory.
>          */
>         memstart_addr = round_down(memblock_start_of_DRAM(),
>                                    ARM64_MEMSTART_ALIGN);
> <..snip..>
> }
>
> For example, let's consider a case (which I see on my qualcomm board) where
> memblock_start_of_DRAM() = 0x200000 and ARM64_MEMSTART_ALIGN = 0x40000000 (I
> am using VA_BITS = 48 and a 64K page size), in this case
> memstart_addr is calculated at 0, as the round_down results in a value of 0.
>
> This is in contrast with the definition of the 'memblock_start_of_DRAM':
>
> /* lowest address */
> phys_addr_t __init_memblock memblock_start_of_DRAM(void)
> {
>         return memblock.memory.regions[0].base;
> }
>
> As indicated by logs below, the first memblock region base starts from
> 0x200000 rather than the 'memstart_addr' value (which is 0)
>
> # dmesg | grep -i "Processing" -A 5
> [    0.000000] efi: Processing EFI memory map:
> [    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data |RUN|  |
> |  |  |  |  |   |WB|WT|WC|UC]
> [    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS |   |
> |  |  |  |  |  |   |  |  |  |UC]
>
> # head -1 /proc/iomem
> 00200000-0021ffff : reserved
>
> Since we define 'PHYS_OFFSET' as the physical address of the start of memory
> it would be 0 in this case:
>
> /* PHYS_OFFSET - the physical address of the start of memory. */
> #define PHYS_OFFSET             ({ VM_BUG_ON(memstart_addr & 1);
> memstart_addr; })
>
> On the other hand, the first memblock starts from 0x200000, so my question
> is whether we should update the user-space tools which use the memblocks
> listed in '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the
> base of the 1st memblock) and read the value of 'memstart_addr' somehow in
> user-space to get the PHY_OFFSET, or should the change be done at the kernel
> end to calculate 'memstart_addr' as:
>
>
>         /*
>          * Select a suitable value for the base of physical memory.
>          */
>         memstart_addr = round_down(memblock_start_of_DRAM(),
>                                    ARM64_MEMSTART_ALIGN);
>         if (memstart_addr)

Sorry for the typo: I meant if (!memstart_addr) above

Regards,
Bhupesh

>                 memstart_addr = memblock_start_of_DRAM();
>
> Please share your views.
>
> Thanks,
> Bhupesh



More information about the kexec mailing list