[PATCHv2] mm: Don't offset memmap for flatmem
Vlastimil Babka
vbabka at suse.cz
Fri Feb 27 07:24:56 PST 2015
On 02/24/2015 08:54 PM, Laura Abbott wrote:
> Reviving this thread because I don't think it ever got resolved.
>
> On 2/3/2015 6:25 PM, Laura Abbott wrote:
>> On 1/29/2015 5:13 AM, Vlastimil Babka wrote:
>>>>
>>>> I don't recall introducing ARCH_PFN_OFFSET, are you sure it was me? I'm just
>>>> back today after been offline a week so didn't review the patch but IIRC,
>>>> ARCH_PFN_OFFSET deals with the case where physical memory does not start
>>>> at 0. Without the offset, virtual _PAGE_OFFSET would not physical page 0.
>>>> I don't recall it being related to the alignment of node 0 so if there
>>>> are crashes due to misalignment of node 0 and the fix is ARCH_PFN_OFFSET
>>>> related then I'm surprised.
>>>
>>> You're right that ARCH_PFN_OFFSET wasn't added by you, but by commit
>>> 467bc461d2 which was a bugfix to your commit c713216dee, which did
>>> introduce the mem_map correction code, and after which the code looked like:
>>>
>>> mem_map = NODE_DATA(0)->node_mem_map;
>>> #ifdef CONFIG_ARCH_POPULATES_NODE_MAP
>>> if (page_to_pfn(mem_map) != pgdat->node_start_pfn)
>>> mem_map -= pgdat->node_start_pfn;
>>> #endif /* CONFIG_ARCH_POPULATES_NODE_MAP */
>>>
>>>
>>> It's from 2006 so I can't expect you remember the details, but I had some
>>> trouble finding out what this does. I assume it makes sure that mem_map points
>>> to struct page corresponding to pfn 0, because that's what translations using
>>> mem_map expect.
>>> But pgdat->node_mem_map points to struct page corresponding to
>>> pgdat->node_start_pfn, which might not be 0. So it subtracts node_start_pfn
>>> to fix that. This is OK, as the node_mem_map is allocated (in this very
>>> function) with padding so that it covers a MAX_ORDER_NR_PAGES aligned area
>>> where node_mem_map may point to the middle of it.
>>>
>>> Commit 467bc461d2 fixed this in case the first pfn is not 0, but ARCH_PFN_OFFSET.
>>> So mem_map points to struct page corresponding to pfn=ARCH_PFN_OFFSET, which
>>> is OK. But I still have few doubts:
>>>
>>> 1) The "if (page_to_pfn(mem_map) != pgdat->node_start_pfn)" sort of silently
>>> assumes that mem_map is allocated at the beginning of the node, i.e. at
>>> pgdat->node_start_pfn. And the only reason for this if-condition to be true,
>>> is that we haven't corrected the page_to_pfn translation, which uses mem_map.
>>> Is this assumption always OK to do? Shouldn't the if-condition be instead about
>>> pgdat->node_start_pfn not being aligned?
>>>
>>> 2) The #ifdef guard is about CONFIG_ARCH_POPULATES_NODE_MAP, which is nowadays called > CONFIG_HAVE_MEMBLOCK_NODE_MAP. But shouldn't it be #ifdef FLATMEM instead?
>>> After all, we are correcting value of mem_map based on page_to_pfn code
>>> variant used on FLATMEM. arm doesn't define
>>> CONFIG_ARCH_POPULATES_NODE_MAP but apparently needs this correction.
>>>
>>
>> Just doing #ifdef FLATMEM doesn't work because ARCH_PFN_OFFSET doesn't
>> seem to be picked up properly for NOMMU arches properly. Probably just
>> missing a header somewhere.
>>
>>> 3) The node_mem_map allocation code aligns the allocation to MAX_ORDER_NR_PAGES,
>>> so the offset between the start of the allocated map and where node_mem_map
>>> points to will be up to MAX_ORDER_NR_PAGES.
>>> However, here we subtract (in current kernel) (pgdat->node_start_pfn - ARCH_PFN_OFFSET).
>>> That looks like another silent assumption, that pgdat->node_start_pfn is always
>>> between ARCH_PFN_OFFSET and ARCH_PFN_OFFSET + MAX_ORDER_NR_PAGES. If it were
>>> larger, the mem_map correction would subtract too much and end up below what
>>> was allocated for node_mem_map, no? The bug report behind this patch said that
>>> first 2MB of memory was reserved using "no-map flag using DT". Unless this somehow
>>> translates to ARCH_PFN_OFFSET at build time, we would underflow mem_map, right?
>>> Maybe I'm just overly paranoid here and of course ARCH_PFN_OFFSET is determined
>>> properly on arm...
>>>
>>> If anyone can confirm my doubts or point me to what I'm missing, thanks.
>>
>> ARCH_PFN_OFFSET should always be the lowest PFN in the system, otherwise
>> I think plenty of other things are broken given how many architectures
>> make this assumption. That said, I don't think subtracting ARCH_PFN_OFFSET
>> makes it obvious why the adjustment is being made.
>>
>> Thanks,
>> Laura
>>
>
> I was incorrect before: it isn't just NOMMU but architectures that don't use
> asm-generic/memory_model.h which failed to compile. I could respin with
Hm I see, some architectures use own variant of page_to_pfn, that's why it's
being used in the if () check.
> more ifdefery around the ARCH_PFN_OFFSET if that sounds reasonable.
So I think your v2 might be correct already. Unless there's an architecture that
defines CONFIG_FLATMEM and not CONFIG_HAVE_MEMBLOCK_NODE_MAP and places memmap
somewhere else than pgdat->node_start_pfn, which would trigger the check for a
wrong reason after the patch.
Looks like arm is an arch that doesn't define CONFIG_HAVE_MEMBLOCK_NODE_MAP, yet
it defines ARCH_PFN_OFFSET. With your patch it would correct memmap by the
calculated offset, not the ARCH_PFN_OFFSET constant. Are these two the same
then? Should there be something like a VM_BUG_ON that ARCH_PFN_OFFSET (if it
exists) is indeed equal to the calculated offset? Or maybe a more general
VM_BUG_ON checking that after any correction we make, the (page_to_pfn(mem_map)
== pgdat->node_start_pfn) condition holds?
> Thanks,
> Laura
>
More information about the linux-arm-kernel
mailing list