[Query] PAGE_OFFSET on KASLR enabled ARM64 kernel

Bhupesh Sharma bhsharma at redhat.com
Fri Jun 1 14:41:44 PDT 2018


On 05/31/2018 10:21 AM, Bhupesh Sharma wrote:
> Hi Ard,
> 
> Sorry I was out for most of the day yesterday. Please see my responses inline.
> 
> On Mon, May 28, 2018 at 12:16 PM, Ard Biesheuvel
> <ard.biesheuvel at linaro.org> wrote:
>> On 27 May 2018 at 23:03, Bhupesh Sharma <bhsharma at redhat.com> wrote:
>>> Hi ARM64 maintainers,
>>>
>>> I am confused about the PAGE_OFFSET value (or the start of the linear
>>> map) on a KASLR enabled ARM64 kernel that I am seeing on a board which
>>> supports a compatible EFI firmware (with EFI_RNG_PROTOCOL support).
>>>
>>> 1. 'arch/arm64/include/asm/memory.h' defines PAGE_OFFSET as:
>>>
>>> /*
>>>   * PAGE_OFFSET - the virtual address of the start of the linear map (top
>>>   *         (VA_BITS - 1))
>>>   */
>>> #define PAGE_OFFSET        (UL(0xffffffffffffffff) - \
>>>      (UL(1) << (VA_BITS - 1)) + 1)
>>>
>>> So for example on a platform with VA_BITS=48, we have:
>>> PAGE_OFFSET = 0xffff800000000000
>>>
>>> 2. However, for the KASLR case, we set the 'memstart_offset_seed ' to
>>> use the 16-bits of the 'kaslr-seed' to randomize the linear region in
>>> 'arch/arm64/kernel/kaslr.c' :
>>>
>>> u64 __init kaslr_early_init(u64 dt_phys)
>>> {
>>> <snip..>
>>>      /* use the top 16 bits to randomize the linear region */
>>>      memstart_offset_seed = seed >> 48;
>>> <snip..>
>>> }
>>>
>>> 3. Now, we use the 'memstart_offset_seed' value to randomize the
>>> 'memstart_addr' value in 'arch/arm64/mm/init.c':
>>>
>>> void __init arm64_memblock_init(void)
>>> {
>>> <snip..>
>>>
>>>      if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) {
>>>          extern u16 memstart_offset_seed;
>>>          u64 range = linear_region_size -
>>>                  (memblock_end_of_DRAM() - memblock_start_of_DRAM());
>>>
>>>          /*
>>>           * If the size of the linear region exceeds, by a sufficient
>>>           * margin, the size of the region that the available physical
>>>           * memory spans, randomize the linear region as well.
>>>           */
>>>          if (memstart_offset_seed > 0 && range >= ARM64_MEMSTART_ALIGN) {
>>>              range = range / ARM64_MEMSTART_ALIGN + 1;
>>>              memstart_addr -= ARM64_MEMSTART_ALIGN *
>>>                       ((range * memstart_offset_seed) >> 16);
>>>          }
>>>      }
>>> <snip..>
>>> }
>>>
>>> 4. Since 'memstart_addr' indicates the start of physical RAM, we
>>> randomize the same on basis of 'memstart_offset_seed' value above.
>>> Also the 'memstart_addr' value is available in '/proc/kallsyms' and
>>> hence can be accessed by user-space applications to read the
>>> 'memstart_addr' value.
>>>
>>> 5. Now since the PAGE_OFFSET value is also used by several user space
>>> tools (for e.g. makedumpfile tool uses the same to determine the start
>>> of linear region and hence to read PT_NOTE fields from /proc/kcore), I
>>> am not sure how to read the randomized value of the same in the KASLR
>>> enabled case.
>>>
>>> 6. Reading the code further and adding some debug prints, it seems the
>>> 'memblock_start_of_DRAM()' value is more closer to the actual start of
>>> linear region rather than 'memstart_addr' and 'PAGE_OFFSET" in case of
>>> KASLR enabled kernel:
>>>
>>> [root at qualcomm-amberwing] # dmesg | grep -i "arm64_memblock_init" -A 5
>>>
>>> [    0.000000] inside arm64_memblock_init, memstart_addr = ffff976a00000000,
>>> linearstart_addr = ffffe89600200000, memblock_start_of_DRAM = ffffe89600200000,
>>> PHYS_OFFSET = ffff976a00000000, PAGE_OFFSET = ffff800000000000,
>>> KIMAGE_VADDR = ffff000008000000, kimage_vaddr = ffff20c2d7800000
>>>
>>> [root at qualcomm-amberwing] # dmesg | grep -i "Virtual kernel memory layout" -A 15
>>> [    0.000000] Virtual kernel memory layout:
>>> [    0.000000]     modules : 0xffff000000000000 - 0xffff000008000000
>>> (   128 MB)
>>> [    0.000000]     vmalloc : 0xffff000008000000 - 0xffff7bdfffff0000
>>> (126847 GB)
>>> [    0.000000]       .text : 0xffff20c2d7880000 - 0xffff20c2d8040000
>>> (  7936 KB)
>>> [    0.000000]     .rodata : 0xffff20c2d8040000 - 0xffff20c2d83a0000
>>> (  3456 KB)
>>> [    0.000000]       .init : 0xffff20c2d83a0000 - 0xffff20c2d8750000
>>> (  3776 KB)
>>> [    0.000000]       .data : 0xffff20c2d8750000 - 0xffff20c2d891b200
>>> (  1837 KB)
>>> [    0.000000]        .bss : 0xffff20c2d891b200 - 0xffff20c2d90a5198
>>> (  7720 KB)
>>> [    0.000000]     fixed   : 0xffff7fdffe790000 - 0xffff7fdffec00000
>>> (  4544 KB)
>>> [    0.000000]     PCI I/O : 0xffff7fdffee00000 - 0xffff7fdfffe00000
>>> (    16 MB)
>>> [    0.000000]     vmemmap : 0xffff7fe000000000 - 0xffff800000000000
>>> (   128 GB maximum)
>>> [    0.000000]               0xffff7ffa25800800 - 0xffff7ffa2b800000
>>> (    95 MB actual)
>>> [    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
>>> ( 98302 MB)
>>>
>>> As one can see above, the 'memblock_start_of_DRAM()' value of
>>> 0xffffe89600200000 represents the start of linear region:
>>>
>>> [    0.000000]     memory  : 0xffffe89600200000 - 0xffffe8ae00000000
>>> ( 98302 MB)
>>>
>>> So, my question is to access the start of linear region (which was
>>> earlier determinable via PAGE_OFFSET macro), whether I should:
>>>
>>> - do some back-computation for the start of linear region from the
>>> 'memstart_addr' in user-space, or
>>> - use a new global variable in kernel which is assigned the value of
>>> memblock_start_of_DRAM()' and assign it to '/proc/kallsyms', so that
>>> it can be read by user-space tools, or
>>> - whether we should rather look at removing the PAGE_OFFSET usage from
>>> the kernel and replace it with a global variable instead which is
>>> properly updated for KASLR case as well.
>>>
>>> Kindly share your opinions on what can be a suitable solution in this case.
>>>
>>> Thanks for your help.
>>>
>>
>> Hello Bhupesh,
>>
>> Could you explain what the relevance is of PAGE_OFFSET to userland?
>> The only thing that should matter is where the actual linear mapping
>> of DRAM is, and I am not sure I understand why we care about where it
>> resides relative to the base of the linear region.
> 
> Actually certain user-space tools like makedumpfile (which is used to
> generate and compress the vmcore) and crash-utility (which is used to
> debug the vmcore), rely on the PAGE_OFFSET value (which denotes the
> base of the linear map region) to determine virtual to physical
> mapping of the addresses lying in the linear region .
> 
> One specific use case that I am working on at the moment is the
> makedumpfile '--mem-usage', which allows one to see the page numbers
> of current system (1st kernel) in different use (please see
> MAKEDUMPFILE(8) for more details).
> 
> Using this we can know how many pages are dumpable when different
> dump_level is specified when invoking the makedumpfile.
> 
> Normally, makedumpfile analyses the contents of '/proc/kcore' (while
> excluding the crashkernel range), and then calculates the page number
> of different kind per vmcoreinfo.
> 
> For e.g. here is an output from my arm64 board (a non KASLR boot):
> 
>      TYPE            PAGES                   EXCLUDABLE      DESCRIPTION
>      ----------------------------------------------------------------------
>      ZERO            49524                   yes             Pages
> filled with zero
>      NON_PRI_CACHE   15143                   yes             Cache
> pages without private flag
>      PRI_CACHE       29147                   yes             Cache
> pages with private flag
>      USER            3684                    yes             User process pages
>      FREE            1450569                 yes             Free pages
>      KERN_DATA       14243                   no              Dumpable kernel data
> 
>      page size:              65536
>      Total pages on system:  1562310
>      Total size on system:   102387548160     Byte
> 
> This use case requires directly reading the '/proc/kcore' and the
> hence the PAGE_OFFSET value is used to determine the base address of
> the linear region, whose value is not static in case of KASLR boot.
> 
> Another use-case is where the crash-utility uses the PAGE_OFFSET value
> to perform a virtual-to-physical conversion for the address lying in
> the linear region:
> 
> ulong
> arm64_VTOP(ulong addr)
> {
>      if (machdep->flags & NEW_VMEMMAP) {
>          if (addr >= machdep->machspec->page_offset)
>              return machdep->machspec->phys_offset
>                  + (addr - machdep->machspec->page_offset);
> 
> <..snip..>
> }
> 

Another confusing concept is the rounded-up value of 'memstart_addr' in 
'arch/arm64/mm/init.c' when booting a non-KASLR_ kernel and when the 
value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN:

void __init arm64_memblock_init(void)
{

<..snip..>
	/*
	 * Select a suitable value for the base of physical memory.
	 */
	memstart_addr = round_down(memblock_start_of_DRAM(),
				   ARM64_MEMSTART_ALIGN);
<..snip..>
}

For example, let's consider a case (which I see on my qualcomm board) 
where memblock_start_of_DRAM() = 0x200000 and ARM64_MEMSTART_ALIGN = 
0x40000000 (I am using VA_BITS = 48 and a 64K page size), in this case
memstart_addr is calculated at 0, as the round_down results in a value of 0.

This is in contrast with the definition of the 'memblock_start_of_DRAM':

/* lowest address */
phys_addr_t __init_memblock memblock_start_of_DRAM(void)
{
	return memblock.memory.regions[0].base;
}

As indicated by logs below, the first memblock region base starts from 
0x200000 rather than the 'memstart_addr' value (which is 0)

# dmesg | grep -i "Processing" -A 5
[    0.000000] efi: Processing EFI memory map:
[    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data 
|RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
[    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS 
|   |  |  |  |  |  |  |   |  |  |  |UC]

# head -1 /proc/iomem
00200000-0021ffff : reserved

Since we define 'PHYS_OFFSET' as the physical address of the start of 
memory it would be 0 in this case:

/* PHYS_OFFSET - the physical address of the start of memory. */
#define PHYS_OFFSET		({ VM_BUG_ON(memstart_addr & 1); memstart_addr; })

On the other hand, the first memblock starts from 0x200000, so my 
question is whether we should update the user-space tools which use the 
memblocks listed in '/proc/iomem' to obtain the value of PHY_OFFSET (by 
reading the base of the 1st memblock) and read the value of 
'memstart_addr' somehow in user-space to get the PHY_OFFSET, or should 
the change be done at the kernel end to calculate 'memstart_addr' as:


	/*
	 * Select a suitable value for the base of physical memory.
	 */
	memstart_addr = round_down(memblock_start_of_DRAM(),
				   ARM64_MEMSTART_ALIGN);
	if (memstart_addr)
		memstart_addr = memblock_start_of_DRAM();

Please share your views.

Thanks,
Bhupesh



More information about the kexec mailing list