am35xx memory management issues
Markku Ahvenjärvi
markku.ahvenjarvi at nomovok.com
Fri Nov 13 05:05:42 PST 2015
Hi,
On 12.11.2015 19:06, Tony Lindgren wrote:
> Hi,
>
> * Markku Ahvenjärvi <markku.ahvenjarvi at nomovok.com> [151112 07:26]:
>> Hello everyone,
>>
>> We have am3517 based board and are experiencing sporadic corruption of mm structures. We've had this problem for months now and haven't really got bottom of it.
>>
>> Our board is currently using 3.18.20, but with am3517-evm we've tried pretty much everything between v3.14 and v4.2. So far we've been able to reproduce it on am3517-evm, craneboard and beagleboard (rev. C3 and C4). We have also tested am/dm37x-evm, am335x-evm and beagle bone black, no problems seen.
>>
>> Usually kernel it panics in 'kernel BUG at mm/rmap.c:406!', but occasionally there's 'BUG: Bad rss-counter state' prints followed by NULL pointer deref or another BUG statement in mm/slab.c. Sometimes spinlock lockup or already unlocked reported, so it is quite random.
>>
>> Reproducing can take from half hour up to few days. We are using stress-ng with options:
>> stress-ng --cpu 1 --vm 3 --vm-bytes 64M --fork 4
>>
>> In our tests we have noticed that kernel configuration affect frequency of the problem. So far we haven't seen any with omap2plus_defconfig, but with slimmer defconfig like the one we are using for our board we can get it in few hours. We bisected our defconfig and omap2plus_defconfig, but couldn't pinpoint any specific config that would cause these problems: it just got less frequent until stopped occurring. To rule out any bad behaving drivers, we basically disabled everything but serial and it just kept crashing.
>
> Adding also LAKML to Cc. Can you check if it starts happening if you
> leave out other omaps from .config other than CONFIG_ARCH_OMAP3?
> That's to compile code only for ARMv7 and leave out ARMv6.
>
> Also please check if leaving out CONFIG_SMP_ON_UP affects things.
Alright, will do.
>> Someone was having quite similar problems back in 2012, but other than that we've found nothing:
>> http://thread.gmane.org/gmane.linux.ports.arm.omap/78039/
>>
>> Anyone seen this kind of issues before? Any ideas what might cause this?
>
> If it starts happening after after leaving out ARMv6 or SMP_ON_UP,
> it could be a cache bug or missing errata that's needed.
Right.
Regards,
Markku
>
> Regards,
>
> Tony
>
>
>> [ 0.000000] Booting Linux on physical CPU 0x0
>> [ 0.000000] Linux version 3.18.24 (markku at thinkpad) (gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11) ) #2 PREEMPT Wed Nov 4 09:51:36 EET 2015
>> [ 0.000000] CPU: ARMv7 Processor [411fc087] revision 7 (ARMv7), cr=10c5387d
>> [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
>> [ 0.000000] Machine model: TI AM3517 EVM (AM3517/05 TMDSEVM3517)
>> [ 0.000000] cma: Reserved 8 MiB at 0x8f400000
>> [ 0.000000] Memory policy: Data cache writeback
>> [ 0.000000] On node 0 totalpages: 65280
>> [ 0.000000] free_area_init_node: node 0, pgdat c09be980, node_mem_map cfce7000
>> [ 0.000000] Normal zone: 512 pages used for memmap
>> [ 0.000000] Normal zone: 0 pages reserved
>> [ 0.000000] Normal zone: 65280 pages, LIFO batch:15
>> [ 0.000000] HighMem zone: 1048574 pages exceeds freesize 0
>> [ 0.000000] CPU: All CPU(s) started in SVC mode.
>> [ 0.000000] AM3517 ES1.1 (l2cache sgx neon )
>> [ 0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
>> [ 0.000000] pcpu-alloc: [0] 0
>> [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64768
>> [ 0.000000] Kernel command line: console=ttyO2,115200
>> [ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
>> [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
>> [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
>> [ 0.000000] Memory: 239940K/261120K available (4809K kernel code, 341K rwdata, 1816K rodata, 2996K init, 353K bss, 21180K reserved, 0K highmem)
>> [ 0.000000] Virtual kernel memory layout:
>> [ 0.000000] vector : 0xffff0000 - 0xffff1000 ( 4 kB)
>> [ 0.000000] fixmap : 0xffc00000 - 0xffe00000 (2048 kB)
>> [ 0.000000] vmalloc : 0xd0800000 - 0xff000000 ( 744 MB)
>> [ 0.000000] lowmem : 0xc0000000 - 0xd0000000 ( 256 MB)
>> [ 0.000000] pkmap : 0xbfe00000 - 0xc0000000 ( 2 MB)
>> [ 0.000000] modules : 0xbf000000 - 0xbfe00000 ( 14 MB)
>> [ 0.000000] .text : 0xc0008000 - 0xc0680984 (6627 kB)
>> [ 0.000000] .init : 0xc0681000 - 0xc096e000 (2996 kB)
>> [ 0.000000] .data : 0xc096e000 - 0xc09c354c ( 342 kB)
>> [ 0.000000] .bss : 0xc09c354c - 0xc0a1b97c ( 354 kB)
>> [ 0.000000] Preemptible hierarchical RCU implementation.
>> [ 0.000000] NR_IRQS:16 nr_irqs:16 16
>> [ 0.000000] IRQ: Found an INTC at 0xfa200000 (revision 4.0) with 96 interrupts
>> [ 0.000000] Clocking rate (Crystal/Core/MPU): 26.0/332/600 MHz
>> [ 0.000000] OMAP clockevent source: timer2 at 13000000 Hz
>> [ 0.000023] sched_clock: 32 bits at 13MHz, resolution 76ns, wraps every 330382100403ns
>> [ 0.000058] OMAP clocksource: timer1 at 13000000 Hz
>> [ 0.000598] Console: colour dummy device 80x30
>> [ 0.000635] Calibrating delay loop... 589.82 BogoMIPS (lpj=294912)
>> [ 0.008980] pid_max: default: 32768 minimum: 301
>> [ 0.009168] Security Framework initialized
>> [ 0.009264] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
>> [ 0.009282] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
>> [ 0.010313] CPU: Testing write buffer coherency: ok
>> [ 0.010936] Setting up static identity map for 0x80496c78 - 0x80496cd0
>> [ 0.013878] devtmpfs: initialized
>> [ 0.016530] VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 1
>> [ 0.038120] omap_hwmod: mcbsp2_sidetone using broken dt data from mcbsp
>> [ 0.038751] omap_hwmod: mcbsp3_sidetone using broken dt data from mcbsp
>> [ 0.082753] omap_hwmod: mcbsp2: cannot be enabled for reset (3)
>> [ 0.099153] pinctrl core: initialized pinctrl subsystem
>> [ 0.100179] regulator-dummy: no parameters
>> [ 0.134359] NET: Registered protocol family 16
>> [ 0.137058] DMA: preallocated 256 KiB pool for atomic coherent allocations
>> [ 0.146611] Reprogramming SDRC clock to 332000000 Hz
>> [ 0.149695] platform 480c5000.aes: Cannot lookup hwmod 'aes'
>> [ 0.156050] OMAP GPIO hardware version 2.5
>> [ 0.173473] platform 480c3000.sham: Cannot lookup hwmod 'sham'
>> [ 0.174042] platform 480cb000.smartreflex: Cannot lookup hwmod 'smartreflex_core'
>> [ 0.181773] omap-gpmc 6e000000.gpmc: GPMC revision 5.0
>> [ 0.182409] platform 480ab000.usb_otg_hs: Cannot lookup hwmod 'usb_otg_hs'
>> [ 0.185485] No ATAGs?
>> [ 0.185526] hw-breakpoint: debug architecture 0x4 unsupported.
>> [ 0.187801] OMAP DMA hardware revision 4.0
>> [ 0.248481] omap-dma-engine 48056000.dma-controller: OMAP DMA engine driver
>> [ 0.249924] vmmc_fixed: 3300 mV
>> [ 0.251923] SCSI subsystem initialized
>> [ 0.252848] usbcore: registered new interface driver usbfs
>> [ 0.253127] usbcore: registered new interface driver hub
>> [ 0.253330] usbcore: registered new device driver usb
>> [ 0.255867] omap_i2c 48070000.i2c: bus 0 rev3.3 at 400 kHz
>> [ 0.257215] omap_i2c 48072000.i2c: bus 1 rev3.3 at 400 kHz
>> [ 0.258330] omap_i2c 48060000.i2c: bus 2 rev3.3 at 400 kHz
>> [ 0.260815] Switched to clocksource timer1
>> [ 0.340661] NET: Registered protocol family 2
>> [ 0.342429] TCP established hash table entries: 2048 (order: 1, 8192 bytes)
>> [ 0.342506] TCP bind hash table entries: 2048 (order: 3, 40960 bytes)
>> [ 0.342604] TCP: Hash tables configured (established 2048 bind 2048)
>> [ 0.342743] TCP: reno registered
>> [ 0.342768] UDP hash table entries: 256 (order: 1, 12288 bytes)
>> [ 0.342879] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
>> [ 0.343204] NET: Registered protocol family 1
>> [ 0.861358] hw perfevents: enabled with armv7_cortex_a8 PMU driver, 5 counters available
>> [ 0.867219] futex hash table entries: 256 (order: 0, 7168 bytes)
>> [ 0.870487] VFS: Disk quotas dquot_6.5.2
>> [ 0.870589] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
>> [ 0.871381] msgmni has been set to 484
>> [ 0.874913] io scheduler noop registered
>> [ 0.874948] io scheduler deadline registered
>> [ 0.875029] io scheduler cfq registered (default)
>> [ 0.877145] pinctrl-single 48002030.pinmux: 284 pins at pa fa002030 size 568
>> [ 0.877537] pinctrl-single 48002a00.pinmux: 46 pins at pa fa002a00 size 92
>> [ 0.880571] omap_uart 4806a000.serial: no wakeirq for uart0
>> [ 0.881110] 4806a000.serial: ttyO0 at MMIO 0x4806a000 (irq = 88, base_baud = 3000000) is a OMAP UART0
>> [ 0.882028] omap_uart 4806c000.serial: no wakeirq for uart0
>> [ 0.882573] 4806c000.serial: ttyO1 at MMIO 0x4806c000 (irq = 89, base_baud = 3000000) is a OMAP UART1
>> [ 0.883521] omap_uart 49020000.serial: no wakeirq for uart0
>> [ 0.883691] 49020000.serial: ttyO2 at MMIO 0x49020000 (irq = 90, base_baud = 3000000) is a OMAP UART2
>> [ 1.469044] console [ttyO2] enabled
>> [ 1.492339] brd: module loaded
>> [ 1.498629] mtdoops: mtd device (mtddev=name/number) must be supplied
>> [ 1.508182] usbcore: registered new interface driver asix
>> [ 1.514672] usbcore: registered new interface driver ax88179_178a
>> [ 1.522285] usbcore: registered new interface driver cdc_ether
>> [ 1.529444] usbcore: registered new interface driver smsc95xx
>> [ 1.536463] usbcore: registered new interface driver net1080
>> [ 1.543372] usbcore: registered new interface driver cdc_subset
>> [ 1.550618] usbcore: registered new interface driver cdc_ncm
>> [ 1.561182] omap_wdt: OMAP Watchdog Timer Rev 0x31: initial timeout 60 sec
>> [ 1.595009] usbcore: registered new interface driver usbhid
>> [ 1.601583] usbhid: USB HID core driver
>> [ 1.607206] oprofile: using arm/armv7
>> [ 1.611987] nf_conntrack version 0.5.0 (3877 buckets, 15508 max)
>> [ 1.619512] TCP: cubic registered
>> [ 1.623127] Initializing XFRM netlink socket
>> [ 1.627898] NET: Registered protocol family 17
>> [ 1.632751] NET: Registered protocol family 15
>> [ 1.637616] Key type dns_resolver registered
>> [ 1.642382] omap2_set_init_voltage: unable to find boot up OPP for vdd_mpu_iva
>> [ 1.650025] omap2_set_init_voltage: unable to set vdd_mpu_iva
>> [ 1.656119] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
>> [ 1.663479] omap2_set_init_voltage: unable to set vdd_core
>> [ 1.670110] PM: no software I/O chain control; some wakeups may be lost
>> [ 1.677499] pm: Failed to request pm_wkup irq
>> [ 1.682230] ThumbEE CPU extension supported.
>> [ 1.686920] Registering SWP/SWPB emulation handler
>> [ 1.697176] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
>> [ 1.705634] mmc0: host does not support reading read-only switch, assuming write-enable
>> [ 1.721911] mmc0: new high speed SDHC card at address 0002
>> [ 1.737955] mmcblk0: mmc0:0002 3.81 GiB
>> [ 1.748383] mmcblk0: p1 p2 p3
>> [ 1.756622] Warning: unable to open an initial console.
>> [ 1.772351] Freeing unused kernel memory: 2996K (c0681000 - c096e000)
>> [ 2.651221] udevd[643]: starting version 182
>> [ 4.101678] random: dd urandom read with 51 bits of entropy available
>> [ 15.397932] random: nonblocking pool is initialized
>> [ 382.789857] perf interrupt took too long (2535 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
>> [ 755.387860] perf interrupt took too long (5004 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
>> [ 4675.751682] ------------[ cut here ]------------
>> [ 4675.814115] WARNING: CPU: 0 PID: 27573 at mm/rmap.c:226 unlink_anon_vmas+0x20c/0x21c()
>> [ 4675.895950] Modules linked in:
>> [ 4675.927371] CPU: 0 PID: 27573 Comm: stress-ng-fork Not tainted 3.18.24 #2
>> [ 4676.007080] [<c00145b4>] (unwind_backtrace) from [<c0011e68>] (show_stack+0x10/0x14)
>> [ 4676.089059] [<c0011e68>] (show_stack) from [<c0035824>] (warn_slowpath_common+0x70/0x88)
>> [ 4676.172027] [<c0035824>] (warn_slowpath_common) from [<c00358d8>] (warn_slowpath_null+0x1c/0x24)
>> [ 4676.266028] [<c00358d8>] (warn_slowpath_null) from [<c00ef6b8>] (unlink_anon_vmas+0x20c/0x21c)
>> [ 4676.358081] [<c00ef6b8>] (unlink_anon_vmas) from [<c00e4658>] (free_pgtables+0x78/0xcc)
>> [ 4676.441074] [<c00e4658>] (free_pgtables) from [<c00ec624>] (exit_mmap+0xf0/0x230)
>> [ 4676.521016] [<c00ec624>] (exit_mmap) from [<c003313c>] (mmput+0x50/0xec)
>> [ 4676.593103] [<c003313c>] (mmput) from [<c0036434>] (do_exit+0x25c/0x9d0)
>> [ 4676.665045] [<c0036434>] (do_exit) from [<c00379e8>] (do_group_exit+0x3c/0xb0)
>> [ 4676.741161] [<c00379e8>] (do_group_exit) from [<c0037a6c>] (__wake_up_parent+0x0/0x18)
>> [ 4676.824005] ---[ end trace 216df8b29a401aa4 ]---
>> [ 4676.875157] ------------[ cut here ]------------
>> [ 4676.880036] kernel BUG at mm/rmap.c:406!
>> [ 4676.884144] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
>> [ 4676.889889] Modules linked in:
>> [ 4676.893107] CPU: 0 PID: 27573 Comm: stress-ng-fork Tainted: G W 3.18.24 #2
>> [ 4676.901400] task: cf220c80 ti: ce072000 task.ti: ce072000
>> [ 4676.907077] PC is at unlink_anon_vmas+0x1dc/0x21c
>> [ 4676.912007] LR is at unlink_anon_vmas+0x104/0x21c
>> [ 4676.916935] pc : [<c00ef688>] lr : [<c00ef5b0>] psr: 200c0013
>> [ 4676.916935] sp : ce073e80 ip : 00000000 fp : c09c13c6
>> [ 4676.928949] r10: ce19c8c8 r9 : ce19c8fc r8 : ce19c904
>> [ 4676.934419] r7 : ce0780e8 r6 : ce1ceaa0 r5 : c09fc620 r4 : ce1ceaa0
>> [ 4676.941251] r3 : 00000004 r2 : ffff0001 r1 : 00000000 r0 : ce0eb568
>> [ 4676.948086] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
>> [ 4676.955556] Control: 10c5387d Table: 8e100019 DAC: 00000015
>> [ 4676.961571] Process stress-ng-fork (pid: 27573, stack limit = 0xce072238)
>> [ 4676.968677] Stack: (0xce073e80 to 0xce074000)
>> [ 4676.973246] 3e80: 00000000 ce0eb568 cf2b893c ce19c8c8 ce1c7768 4a5c8000 ce073ed8 00002000
>> [ 4676.981811] 3ea0: 00000000 ce068040 ce068084 c00e4658 4a5c8000 c00e6538 00000000 ce183c90
>> [ 4676.990376] 3ec0: ce073f00 ce068040 000000f8 c000e8a4 00000001 c00ec624 ce068040 00000001
>> [ 4676.998940] 3ee0: 00000000 00000000 ffffffff b6f5a070 ffffffec 000000c1 00000400 ce175000
>> [ 4677.007505] 3f00: c0994c78 cf220c80 cf220c80 ce072008 000000f8 ce068040 00000000 ce072008
>> [ 4677.016069] 3f20: 000000f8 ce068040 00000000 ce072008 000000f8 c003313c cf221104 cf220c80
>> [ 4677.024634] 3f40: ce072008 c0036434 be9d6ea4 c0068a90 cf006940 cf00699c ce072030 00000036
>> [ 4677.033198] 3f60: c09a2d94 00000000 ce072000 ce1d2800 000000f8 c000e8a4 ce072000 00000000
>> [ 4677.041762] 3f80: 0005bb68 c00379e8 00000000 00000000 0005bb58 000000f8 c000e8a4 c0037a6c
>> [ 4677.050326] 3fa0: 00000000 c000e720 00000000 00000000 00000000 00000000 00000000 4a72c468
>> [ 4677.058890] 3fc0: 00000000 00000000 0005bb58 000000f8 00000001 00000000 be9d6ed0 0005bb68
>> [ 4677.067455] 3fe0: 4a695e80 be9d6ea4 0001b21c 4a695e90 60060010 00000000 00000000 00000000
>> [ 4677.076039] [<c00ef688>] (unlink_anon_vmas) from [<c00e4658>] (free_pgtables+0x78/0xcc)
>> [ 4677.084430] [<c00e4658>] (free_pgtables) from [<c00ec624>] (exit_mmap+0xf0/0x230)
>> [ 4677.092275] [<c00ec624>] (exit_mmap) from [<c003313c>] (mmput+0x50/0xec)
>> [ 4677.099302] [<c003313c>] (mmput) from [<c0036434>] (do_exit+0x25c/0x9d0)
>> [ 4677.106326] [<c0036434>] (do_exit) from [<c00379e8>] (do_group_exit+0x3c/0xb0)
>> [ 4677.113895] [<c00379e8>] (do_group_exit) from [<c0037a6c>] (__wake_up_parent+0x0/0x18)
>> [ 4677.122189] Code: 0a000009 e2820004 ebfdc186 eaffffb2 (e7f001f2)
>> [ 4677.128597] ---[ end trace 216df8b29a401aa5 ]---
>> [ 4677.133435] Kernel panic - not syncing: Fatal exception
>> [ 4677.138911] ---[ end Kernel panic - not syncing: Fatal exception
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
More information about the linux-arm-kernel
mailing list