[PATCH] ARM: sparsemem: Enable CONFIG_HOLES_IN_ZONE config option for SparseMem and HAS_HOLES_MEMORYMODEL for linux-3.0.

Mel Gorman mgorman at suse.de
Thu Aug 4 06:09:28 EDT 2011


On Thu, Aug 04, 2011 at 03:06:39PM +0530, Kautuk Consul wrote:
> Hi Mel,
> 
> My ARM system has 2 memory banks which have the following 2 PFN ranges:
> 60000-62000 and 70000-7ce00.
> 
> My SECTION_SIZE_BITS is #defined to 23.
> 

So bank 0 is 4 sections and bank 1 is 26 sections with the last section
incomplete.

> I am altering the ranges via the following kind of pseudo-code in the
> arch/arm/mach-*/mach-*.c file:
> meminfo->bank[0].size -= (1 << 20)
> meminfo->bank[1].size -= (1 << 20)
> 

Why are you taking 1M off each bank? I could understand aligning the
banks to a section size at least.

That said, there is an assumption that pages within a MAX_ORDER-aligned
block. If you are taking 1M off the end of bank 0, it is no longer
MAX_ORDER aligned. This is the assumption move_freepages_block()
is falling foul of. The problem can be avoided by ensuring that memmap
is valid within MAX_ORDER-aligned ranges.

> After altering the size of both memory banks, the PFN ranges now
> visible to the kernel are:
> 60000-61f00 and 70000-7cd00.
> 
> There is only one node and one ZONE_NORMAL zone on the system and this
> zone accounts for both memory banks as CONFIG_SPARSEMEM
> is enabled in the kernel.
> 

Ok.

> I put some printks in the move_freeblockpages() function, compiled the
> kernel and then ran the following commands:
> ifconfig eth0 107.109.39.102 up
> mount /dev/sda1 /mnt   # Mounted the USB pen drive
> mount -t nfs -o nolock 107.109.39.103:/home/kautuk nfsmnt   # NFS
> mounted an NFS share
> cp test_huge_file nfsmnt/
> 
> I got the following output:
> #> cp test_huge nfsmnt/
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c068a000 start_pfn=7c400 start_page=c0686000
> end_page=c068dfe0 end_pfn=7c7ff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0652000 start_pfn=7a800 start_page=c064e000
> end_page=c0655fe0 end_pfn=7abff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c065a000 start_pfn=7ac00 start_page=c0656000
> end_page=c065dfe0 end_pfn=7afff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c0695000 start_pfn=7c800 start_page=c068e000
> end_page=c0695fe0 end_pfn=7cbff
> page_zone(start_page)=c048107c page_zone(end_page)=c048107c
> page_zonenum(end_page) = 0
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> ------------------------------------------------------------move_freepages_block_start----------------------------------------
> kc: The page=c04f7c00 start_pfn=61c00 start_page=c04f6000
> end_page=c04fdfe0 end_pfn=61fff
> page_zone(start_page)=c048107c page_zone(end_page)=c0481358
> page_zonenum(end_page) = 1
> ------------------------------------------------------------move_freepages_block_end----------------------------------------
> kernel BUG at mm/page_alloc.c:849!
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> pgd = ce9fc000
> [00000000] *pgd=7ca5a031, *pte=00000000, *ppte=00000000
> 
> As per the last line, we can clearly see that the
> page_zone(start_page)=c048107c and page_zone(end_page)=c0481358,
> which are not equal to each other.
> Since they do not match, the code in move_freepages() bugchecks
> because of the following BUG_ON() check:
> page_zone(start_page) != page_zone(end_page)

> The reason for this that the page_zonenum(end_page) is equal to 1 and
> this is different from the page_zonenum(start_page) which is 0.
> 

Because the MAX_ORDER alignment is gone.

> On checking the code within page_zonenum(), I see that this code tries
> to retrieve the zone number from the end_page->flags.
> 

Yes. In the majority of cases a pages node and zone is stored in the
page->flags.

> The reason why we cannot expect the 0x61fff end_page->flags to contain
> a valid zone number is:
> memmap_init_zone() initializes the zone number of all pages for a zone
> via the set_page_links() inline function.
> For the end_page (whose PFN is 0x61fff), set_page_links() cannot be
> possibly called, as the zones are simply not aware of of PFNs above
> 0x61f00 and below 0x70000.
> 

Can you ensure that the ranges passed into free_area_init_node()
are MAX_ORDER aligned as this would initialise the struct pages. You
may have already seen that care is taken when freeing memmap that it
is aligned to MAX_ORDER in free_unused_memmap() in ARM.

> The (end >= zone->zone_start_pfn + zone->spanned_pages) in
> move_freepages_block() does not stop this crash from happening as both
> our memory banks are in the same zone and the empty space within them
> is accomodated into this zone via the CONFIG_SPARSEMEM
> config option.
> 
> When we enable CONFIG_HOLES_IN_ZONE we survive this BUG_ON as well as
> any other BUG_ONs in the loop in move_freepages() as then the
> pfn_valid_within()/pfn_valid() function takes care of this
> functionality, especially in the case where the newly introduced
> CONFIG_HAVE_ARCH_PFN_VALID is
> enabled.
> 

This is an expensive option in terms of performance. If Russell
wants to pick it up, I won't object but I would strongly suggest that
you solve this problem by ensuring that memmap is initialised on a
MAX_ORDER-aligned boundaries as it'll perform better.

Thanks.

-- 
Mel Gorman
SUSE Labs



More information about the linux-arm-kernel mailing list