Issue on reserving memory with no-map flag in DT
Vlastimil Babka
vbabka at suse.cz
Tue Jan 20 01:54:55 PST 2015
On 01/20/2015 12:57 AM, Laura Abbott wrote:
> On 1/19/2015 7:49 AM, Vlastimil Babka wrote:
>> On 01/17/2015 01:24 AM, Laura Abbott wrote:
>>
>> I admit I may not see clearly through all the arch-specific layers and various
>> config option combinations that are possible here, so I might be misinterpreting
>> the code. But I think the problem here is not insufficient allocation size, but
>> something else.
>>
>> The code above continues by this line:
>>
>> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
>>
>> So, size for the map allocation has already been calculated aligned to
>> MAX_ORDER_NR_PAGES before your patch, and node_mem_map points to the first
>> actually present page, which might be offset from the perfect alignment. Your
>> patch adds another offset to the already aligned size (but you use
>> pageblock_nr_pages which might be lower than MAX_ORDER_NR_PAGES; this seems like
>> a mistake in itself?). So with your patch we have map of aligned size starting
>> from the node_mem_map. This means the last offset-worth of struct pages should
>> be beyond what's needed to access struct page of pgdat_end_pfn(). If we need
>> that extra padding to prevent crashing, then it looks really suspicious...
>>
>> And when I look at node_mem_map usage, I see include/asm/generic/memory_model.h
>> defines __pfn_to_page as (basically)
>>
>> NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
>>
>> and further above is a generic definition of arch_local_page_offset:
>>
>> #define arch_local_page_offset(pfn, nid) \
>> ((pfn) - NODE_DATA(nid)->node_start_pfn)
>>
>> So it looks correct to me without your patch. The map is allocated aligned,
>> node_mem_map points to this map at the offset corresponding to node_start_pfn,
>> and pfn_to_page subtracts node_start_pfn to get the offset relative to
>> node_mem_map. We shouldn't need the extra padding by the node_start_pfn offset,
>> unless something else is misbehaving here.
>>
>> In the issue fixed by 7c45512 that you refer to, the problem was basically that
>> the allocation didn't use aligned size, but this looks different to me?
>>
>>
>
> With this hard coded debugging:
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..241b870 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5029,6 +5029,11 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> map = memblock_virt_alloc_node_nopanic(size,
> pgdat->node_id);
> pgdat->node_mem_map = map + (pgdat->node_start_pfn - start);
> + pr_err(">>> node_start_pfn %lx node_end_pfn %lx\n",
> + pgdat->node_start_pfn, pgdat_end_pfn(pgdat));
> + pr_err(">>> size calculated %lx\n", size);
> + pr_err(">>> allocated region %p-%lx\n", map, ((unsigned long)map)+size);
> +
> }
> #ifndef CONFIG_NEED_MULTIPLE_NODES
> /*
> @@ -5043,6 +5048,8 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> }
> #endif
> #endif /* CONFIG_FLAT_NODE_MEM_MAP */
> + pr_err(">>> pfn %lx page %p\n", 0x200, pfn_to_page(0x200));
> + pr_err(">>> pfn %lx page %p\n", 0xbffff, pfn_to_page(0xbffff));
> }
>
> void __paginginit free_area_init_node(int nid, unsigned long *zones_size,
>
> I get this output:
> [ 0.000000] >>> node_start_pfn 200 node_end_pfn c0000
> [ 0.000000] >>> size calculated 1800000
> [ 0.000000] >>> allocated region edffa000-ef7fa000
> [ 0.000000] >>> pfn 200 page ee002000
> [ 0.000000] >>> pfn bffff page ef7fdfe0
>
> The start and end pfn values are correct but that page value is outside of the
> allocated region for the memory map. This is a CONFIG_FLATMEM system so we
> aren't actually using arch_local_page_offset at all:
>
>
> #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> #define __page_to_pfn(page) ((unsigned long)((page) - mem_map) + \
> ARCH_PFN_OFFSET)
Ah, OK. I searched just for node_mem_map and didn't notice it's also assigned to
mem_map.
> If you do the math, the array size is fine if we don't offset by the
> start but alloc_node_mem_map offsets assuming pfn_to_page will offset
> as well but this doesn't happen in CONFIG_FLATMEM.
>
> Either alloc_node_mem_map needs to drop the offset or the pfn_to_page
> functions need to start adding the offset. It's worth noting that
> this gets corrected properly if we have CONFIG_HAVE_MEMBLOCK_NODE_MAP enabled
> so perhaps the fix is to unoffset for flatmem as well:
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7633c50..271c44b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5036,7 +5036,7 @@ static void __init_refok alloc_node_mem_map(struct pglist_data *pgdat)
> */
> if (pgdat == NODE_DATA(0)) {
> mem_map = NODE_DATA(0)->node_mem_map;
> -#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
> +#if defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) || defined(CONFIG_FLATMEM)
> if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
> mem_map -= (pgdat->node_start_pfn - ARCH_PFN_OFFSET);
But is this correcting the same thing? The offset that's added earlier is
(pgdat->node_start_pfn - start) where "start" is just alignment of the
node_start_pfn to MAX_ORDER_NR_PAGES. But here we subtract whole
pgdat->node_start_pfn, minus a ARCH_PFN_OFFSET constant. Is the constant always
equeal to the earlier value of "start", which is calculated dynamically?.
So I agree that mem_map assignment should be fixed, but maybe not exactly like this?
> #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
>
> Thanks,
> Laura
>
More information about the linux-arm-kernel
mailing list