[next-20150119]regression (mm)?

Kirill A. Shutemov kirill.shutemov at linux.intel.com
Mon Jan 19 16:16:43 PST 2015


Felipe Balbi wrote:
> Hi,
> 
> On Mon, Jan 19, 2015 at 10:42:04AM -0600, Nishanth Menon wrote:
> > Most platforms seem broken intoday's next tag.
> > 
> > https://github.com/nmenon/kernel-test-logs/tree/next-20150119
> > (defconfig: omap2plus_defconfig)
> > 
> > > [    7.166600] ------------[ cut here ]------------
> > > [    7.171676] WARNING: CPU: 0 PID: 54 at mm/mmap.c:2859 exit_mmap+0x1a8/0x21c()
> > > [    7.179194] Modules linked in:
> > > [    7.182479] CPU: 0 PID: 54 Comm: init Not tainted 3.19.0-rc5-next-20150119-00002-gfdefcded1272 #1
> > > [    7.191863] Hardware name: Generic AM33XX (Flattened Device Tree)
> > > [    7.198318] [<c00153f0>] (unwind_backtrace) from [<c0011a74>] (show_stack+0x10/0x14)
> > > [    7.206528] [<c0011a74>] (show_stack) from [<c0580150>] (dump_stack+0x78/0x94)
> > > [    7.214191] [<c0580150>] (dump_stack) from [<c003d4d0>] (warn_slowpath_common+0x7c/0xb4)
> > > [    7.222751] [<c003d4d0>] (warn_slowpath_common) from [<c003d524>] (warn_slowpath_null+0x1c/0x24)
> > > [    7.232038] [<c003d524>] (warn_slowpath_null) from [<c012de64>] (exit_mmap+0x1a8/0x21c)
> > > [    7.240536] [<c012de64>] (exit_mmap) from [<c003abb8>] (mmput+0x44/0xec)
> > > [    7.247612] [<c003abb8>] (mmput) from [<c0151368>] (flush_old_exec+0x300/0x5a4)
> > > [    7.255357] [<c0151368>] (flush_old_exec) from [<c0195c10>] (load_elf_binary+0x2ec/0x1144)
> > > [    7.264111] [<c0195c10>] (load_elf_binary) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.273311] [<c0150ea0>] (search_binary_handler) from [<c019554c>] (load_script+0x260/0x280)
> > > [    7.282232] [<c019554c>] (load_script) from [<c0150ea0>] (search_binary_handler+0x88/0x1ac)
> > > [    7.291066] [<c0150ea0>] (search_binary_handler) from [<c0151f0c>] (do_execveat_common+0x538/0x6c4)
> > > [    7.300628] [<c0151f0c>] (do_execveat_common) from [<c01520c4>] (do_execve+0x2c/0x34)
> > > [    7.308881] [<c01520c4>] (do_execve) from [<c000e5e0>] (ret_fast_syscall+0x0/0x4c)
> > > [    7.316881] ---[ end trace 3b8a46b1b280f423 ]---
> 
> seems like it's caused by:
> 
> b316feb3c37ff19cddcaf1f6b5056c633193257d is the first bad commit
> 
> Adding Kiryl to the loop.
> 
> git bisect start
> # good: [ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc] Linux 3.19-rc5
> git bisect good ec6f34e5b552fb0a52e6aae1a5afbbb1605cc6cc
> # bad: [a0d4287f787889e59db0fd295853a0f1f55d0699] Add linux-next specific files for 20150119
> git bisect bad a0d4287f787889e59db0fd295853a0f1f55d0699
> # good: [1c2f70b77b8ca77f10c59d479d009e07359d00d2] Merge remote-tracking branch 'drm/drm-next'
> git bisect good 1c2f70b77b8ca77f10c59d479d009e07359d00d2
> # good: [73c1390843223d8bfc85795c560c36b3d0ffee40] Merge remote-tracking branch 'leds/for-next'
> git bisect good 73c1390843223d8bfc85795c560c36b3d0ffee40
> # good: [7bc6bef35d48e91ad796b6eead7304998842c782] Merge remote-tracking branch 'pinctrl/for-next'
> git bisect good 7bc6bef35d48e91ad796b6eead7304998842c782
> # bad: [45e1eaa38732ffa3de0d18fe95d2d2b960a7c777] lib: bitmap: change bitmap_shift_right to take unsigned parameters
> git bisect bad 45e1eaa38732ffa3de0d18fe95d2d2b960a7c777
> # good: [c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223] mm, page_alloc: reduce number of alloc_pages* functions' parameters
> git bisect good c82a73a0369a7dd6dcfaf9e6bd572a4e5deda223
> # bad: [0b1c810fbc4bbff7e314dd6ff91c2b4af499199d] mm: don't split THP page when syscall is called
> git bisect bad 0b1c810fbc4bbff7e314dd6ff91c2b4af499199d
> # good: [54faa439355a9ae476a446429967e9e38f04363e] oom, PM: make OOM detection in the freezer path raceless
> git bisect good 54faa439355a9ae476a446429967e9e38f04363e
> # bad: [b6c9f11c6b6993303067f7c04a73258226a6e77e] mm/compaction: add tracepoint to observe behaviour of compaction defer
> git bisect bad b6c9f11c6b6993303067f7c04a73258226a6e77e
> # good: [9ce5d3fb13a80f28db450de4ecf2727893e99c93] mm: pagemap_read: limit scan to virtual region being asked
> git bisect good 9ce5d3fb13a80f28db450de4ecf2727893e99c93
> # bad: [1a7a376546ca56e7750987c15d0c7541c17a512c] mm/compaction: change tracepoint format from decimal to hexadecimal
> git bisect bad 1a7a376546ca56e7750987c15d0c7541c17a512c
> # bad: [4081187ff19cf2186010c003939c17d70d0bbb27] page_writeback: put account_page_redirty() after set_page_dirty()
> git bisect bad 4081187ff19cf2186010c003939c17d70d0bbb27
> # bad: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> git bisect bad b316feb3c37ff19cddcaf1f6b5056c633193257d
> # first bad commit: [b316feb3c37ff19cddcaf1f6b5056c633193257d] mm: account pmd page tables to the process
> 
> I've added a dump_mm() call when the bug happens followed by a
> while (true) loop (to avoid constant reprinting of the same thing),
> here's what I get:
> 
> [    7.235903] ------------[ cut here ]------------
> [    7.240881] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> [    7.248369] Modules linked in: ipv6 autofs4
> [    7.252792] CPU: 0 PID: 58 Comm: systemd Not tainted 3.19.0-rc5-next-20150119-dirty #888
> [    7.261274] Hardware name: Generic AM43 (Flattened Device Tree)
> [    7.267512] [<c0015afc>] (unwind_backtrace) from [<c001221c>] (show_stack+0x10/0x14)
> [    7.275651] [<c001221c>] (show_stack) from [<c058972c>] (dump_stack+0x84/0x9c)
> [    7.283249] [<c058972c>] (dump_stack) from [<c003def0>] (warn_slowpath_common+0x78/0xb4)
> [    7.291750] [<c003def0>] (warn_slowpath_common) from [<c003dfc8>] (warn_slowpath_null+0x1c/0x24)
> [    7.300977] [<c003dfc8>] (warn_slowpath_null) from [<c0133410>] (exit_mmap+0x1b4/0x218)
> [    7.309376] [<c0133410>] (exit_mmap) from [<c003b5f0>] (mmput+0x44/0xec)
> [    7.316385] [<c003b5f0>] (mmput) from [<c0157e68>] (flush_old_exec+0x264/0x5d4)
> [    7.324061] [<c0157e68>] (flush_old_exec) from [<c019f180>] (load_elf_binary+0x288/0x1234)
> [    7.332727] [<c019f180>] (load_elf_binary) from [<c0158304>] (search_binary_handler+0x84/0x1e8)
> [    7.341857] [<c0158304>] (search_binary_handler) from [<c0158c84>] (do_execveat_common+0x53c/0x6b8)
> [    7.351346] [<c0158c84>] (do_execveat_common) from [<c0158e24>] (do_execve+0x24/0x2c)
> [    7.359561] [<c0158e24>] (do_execve) from [<c000e6c0>] (ret_fast_syscall+0x0/0x4c)
> [    7.367485] ---[ end trace 633a89eb76b1d46e ]---
> [    7.372360] mm ed29fa00 mmap ed29b6b8 seqnum 0 task_size 3204448256
> [    7.372360] get_unmapped_area c001cfc0
> [    7.372360] mmap_base 3069620224 mmap_legacy_base 0 highest_vm_end 3202711552
> [    7.372360] pgd ed184000 mm_users 0 mm_count 1 nr_ptes 1 nr_pmds 4294967292 map_count 59
> [    7.372360] hiwater_rss 37 hiwater_vm 37e total_vm 37e locked_vm 0
> [    7.372360] pinned_vm 0 shared_vm 324 exec_vm 254 stack_vm 22
> [    7.372360] start_code 10000 end_code c9d48 start_data da1b8 end_data ea1a4
> [    7.372360] start_brk eb000 brk 10c000 start_stack becd5f10
> [    7.372360] arg_start becd5fd4 arg_end becd5fdf env_start becd5fdf env_end becd5ff1
> [    7.372360] binfmt c08b3158 flags cd core_state   (null)
> [    7.372360] ioctx_table   (null)
> [    7.372360] owner   (null) exe_file ee49d040
> [    7.372360] tlb_flush_pending 0
> [    7.448908] flags: 0x0()
> 
> Looking at nr_pmds, that's basically (unsigned long) -4, which tells me
> we are decrementing mm->nr_pmds without incrementing first. In, when I
> add:
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ba5f3bcca55d..8425fb419eab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1440,11 +1440,15 @@ static inline unsigned long mm_nr_pmds(struct mm_struct *mm)
>  static inline void mm_inc_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_inc(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  
>  static inline void mm_dec_nr_pmds(struct mm_struct *mm)
>  {
>  	atomic_long_dec(&mm->nr_pmds);
> +	printk(KERN_INFO "===> %s nr_pmds %ld\n", __func__,
> +			atomic_long_read(&mm->nr_pmds));
>  }
>  #endif
> 
> I start getting:
> 
> [...]
> 
> [    5.935390] ===> mm_dec_nr_pmds nr_pmds -1
> [    6.236832] random: systemd urandom read with 34 bits of entropy
> available
> [    6.276340] systemd[1]: systemd 215 running in system mode. (+PAM
> +AUDIT +SELINUX +IMA +SYSVINIT +LIBC
> RYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR)
> [    6.291380] systemd[1]: Detected architecture 'arm'.
> 
> Welcome to Debian GNU/Linux 8 (jessie)!
> 
> [    6.434013] systemd[1]: Inserted module 'autofs4'
> [    7.152037] NET: Registered protocol family 10
> [    7.165770] systemd[1]: Inserted module 'ipv6'
> [    7.170646] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.174932] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.179427] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.258496] ===> mm_dec_nr_pmds nr_pmds -1
> [    7.262809] ===> mm_dec_nr_pmds nr_pmds -2
> [    7.267206] ===> mm_dec_nr_pmds nr_pmds -3
> [    7.271486] ===> mm_dec_nr_pmds nr_pmds -4
> [    7.275884] ------------[ cut here ]------------
> [    7.280773] WARNING: CPU: 0 PID: 58 at mm/mmap.c:2859 exit_mmap+0x1b4/0x218()
> 
> [...]
> 
> Which confirms my suspicion. So we never increment nr_pmds, but we
> decrement it. The simplest "fix" is to make mm_nr_pmds() return a signed
> long (see below) and cast roundup()'s return to (signed long), but
> that's not what we really want in this case because it's clear our PMD
> accounting is bogus.
> 
> Kiryl, any better idea on how to balance mm_inc_nr_pmds() and
> mm_dec_nr_pmds() ?

I assume it's on !LPAE kernel, right?

I did a quick look. ARM has folded PMD level in case of 2-level pages
tables, but it doesn't use standard approach -- pgtable-nopmd.h.
As result ARM doesn't have __PAGETABLE_PMD_FOLDED defined.

I will look further tomorrow, but I wounder if we can just define
__PAGETABLE_PMD_FOLDED in arch/arm/include/asm/pgtable-2level.h ?

This way we would also get rid of dead code -- __pmd_alloc() is never
called in this configuration. And fix the accounting issue: mm_*_nr_pmd()
helpers will become nop.

Better option would be converting 2-lvl ARM configuration to
<asm-generic/pgtable-nopmd.h>, but I'm not sure if it's possible.



More information about the linux-arm-kernel mailing list