[PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.
Mel Gorman
mgorman at suse.de
Thu Aug 4 06:09:28 EDT 2011
On Thu, Aug 04, 2011 at 03:06:39PM +0530, Kautuk Consul wrote:
> Hi Mel,
>
> My ARM system has 2 memory banks which have the following 2 PFN ranges:
> 60000-62000 and 70000-7ce00.
>
> My SECTION_SIZE_BITS is #defined to 23.
>
So bank 0 is 4 sections and bank 1 is 26 sections with the last section
incomplete.
> I am altering the ranges via the following kind of pseudo-code in the
> arch/arm/mach-*/mach-*.c file:
> meminfo->bank[0].size -= (1 << 20)
> meminfo->bank[1].size -= (1 << 20)
>
Why are you taking 1M off each bank? I could understand aligning the
banks to a section size at least.
That said, there is an assumption that pages within a MAX_ORDER-aligned
block. If you are taking 1M off the end of bank 0, it is no longer
MAX_ORDER aligned. This is the assumption move_freepages_block()
is falling foul of. The problem can be avoided by ensuring that memmap
is valid within MAX_ORDER-aligned ranges.
> After altering the size of both memory banks, the PFN ranges now
> visible to the kernel are:
> 60000-61f00 and 70000-7cd00.
>
> There is only one node and one ZONE_NORMAL zone on the system and this
> zone accounts for both memory banks as CONFIG_SPARSEMEM
> is enabled in the kernel.
>
Ok.
> I put some printks in the move_freeblockpages() function, compiled the
> kernel and then ran the following commands:
> ifconfig eth0 107.109.39.102 up
> mount /dev/sda1 /mnt # Mounted the USB pen drive
> mount -t nfs -o nolock 107.109.39.103:/home/kautuk nfsmnt # NFS
> mounted an NFS share
> cp test_huge_file nfsmnt/
>
> I got the following output:
> #> cp test_huge nfsmnt/
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c068a000 start_pfn=7c400 start_page=c0686000
> end_page=c068dfe0 end_pfn=7c7ff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0652000 start_pfn=7a800 start_page=c064e000
> end_page=c0655fe0 end_pfn=7abff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c065a000 start_pfn=7ac00 start_page=c0656000
> end_page=c065dfe0 end_pfn=7afff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0695000 start_pfn=7c800 start_page=c068e000
> end_page=c0695fe0 end_pfn=7cbff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c04f7c00 start_pfn=61c00 start_page=c04f6000
> end_page=c04fdfe0 end_pfn=61fff
> page_zone(start_page)=c048107c page_zone(end_page)=c0481358
> page_zonenum(end_page) = 1
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> kernel BUG at mm/page_alloc.c:849!
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = ce9fc000
> [00000000] *pgd=7ca5a031, *pte=00000000, *ppte=00000000
>
> As per the last line, we can clearly see that the
> page_zone(start_page)=c048107c and page_zone(end_page)=c0481358,
> which are not equal to each other.
> Since they do not match, the code in move_freepages() bugchecks
> because of the following BUG_ON() check:
> page_zone(start_page) != page_zone(end_page)
> The reason for this that the page_zonenum(end_page) is equal to 1 and
> this is different from the page_zonenum(start_page) which is 0.
>
Because the MAX_ORDER alignment is gone.
> On checking the code within page_zonenum(), I see that this code tries
> to retrieve the zone number from the end_page->flags.
>
Yes. In the majority of cases a pages node and zone is stored in the
page->flags.
> The reason why we cannot expect the 0x61fff end_page->flags to contain
> a valid zone number is:
> memmap_init_zone() initializes the zone number of all pages for a zone
> via the set_page_links() inline function.
> For the end_page (whose PFN is 0x61fff), set_page_links() cannot be
> possibly called, as the zones are simply not aware of of PFNs above
> 0x61f00 and below 0x70000.
>
Can you ensure that the ranges passed into free_area_init_node()
are MAX_ORDER aligned as this would initialise the struct pages. You
may have already seen that care is taken when freeing memmap that it
is aligned to MAX_ORDER in free_unused_memmap() in ARM.
> The (end >= zone->zone_start_pfn + zone->spanned_pages) in
> move_freepages_block() does not stop this crash from happening as both
> our memory banks are in the same zone and the empty space within them
> is accomodated into this zone via the CONFIG_SPARSEMEM
> config option.
>
> When we enable CONFIG_HOLES_IN_ZONE we survive this BUG_ON as well as
> any other BUG_ONs in the loop in move_freepages() as then the
> pfn_valid_within()/pfn_valid() function takes care of this
> functionality, especially in the case where the newly introduced
> CONFIG_HAVE_ARCH_PFN_VALID is
> enabled.
>
This is an expensive option in terms of performance. If Russell
wants to pick it up, I won't object but I would strongly suggest that
you solve this problem by ensuring that memmap is initialised on a
MAX_ORDER-aligned boundaries as it'll perform better.
Thanks.
--
Mel Gorman
SUSE Labs
More information about the linux-arm-kernel
mailing list