am35xx memory management issues

Tony Lindgren tony at atomide.com
Thu Nov 12 09:06:59 PST 2015


Hi,

* Markku Ahvenjärvi <markku.ahvenjarvi at nomovok.com> [151112 07:26]:
> Hello everyone,
> 
> We have am3517 based board and are experiencing sporadic corruption of mm structures. We've had this problem for months now and haven't really got bottom of it.
> 
> Our board is currently using 3.18.20, but with am3517-evm we've tried pretty much everything between v3.14 and v4.2. So far we've been able to reproduce it on am3517-evm, craneboard and beagleboard (rev. C3 and C4). We have also tested am/dm37x-evm, am335x-evm and beagle bone black, no problems seen.
> 
> Usually kernel it panics in 'kernel BUG at mm/rmap.c:406!', but occasionally there's 'BUG: Bad rss-counter state' prints followed by NULL pointer deref or another BUG statement in mm/slab.c. Sometimes spinlock lockup or already unlocked reported, so it is quite random.
> 
> Reproducing can take from half hour up to few days. We are using stress-ng with options:
> stress-ng --cpu 1 --vm 3 --vm-bytes 64M --fork 4
> 
> In our tests we have noticed that kernel configuration affect frequency of the problem. So far we haven't seen any with omap2plus_defconfig, but with slimmer defconfig like the one we are using for our board we can get it in few hours. We bisected our defconfig and omap2plus_defconfig, but couldn't pinpoint any specific config that would cause these problems: it just got less frequent until stopped occurring. To rule out any bad behaving drivers, we basically disabled everything but serial and it just kept crashing.

Adding also LAKML to Cc. Can you check if it starts happening if you
leave out other omaps from .config other than CONFIG_ARCH_OMAP3?
That's to compile code only for ARMv7 and leave out ARMv6.

Also please check if leaving out CONFIG_SMP_ON_UP affects things.

> Someone was having quite similar problems back in 2012, but other than that we've found nothing:
> http://thread.gmane.org/gmane.linux.ports.arm.omap/78039/
> 
> Anyone seen this kind of issues before? Any ideas what might cause this?

If it starts happening after after leaving out ARMv6 or SMP_ON_UP,
it could be a cache bug or missing errata that's needed.

Regards,

Tony


> [    0.000000] Booting Linux on physical CPU 0x0
> [    0.000000] Linux version 3.18.24 (markku at thinkpad) (gcc version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11) ) #2 PREEMPT Wed Nov 4 09:51:36 EET 2015
> [    0.000000] CPU: ARMv7 Processor [411fc087] revision 7 (ARMv7), cr=10c5387d
> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
> [    0.000000] Machine model: TI AM3517 EVM (AM3517/05 TMDSEVM3517)
> [    0.000000] cma: Reserved 8 MiB at 0x8f400000
> [    0.000000] Memory policy: Data cache writeback
> [    0.000000] On node 0 totalpages: 65280
> [    0.000000] free_area_init_node: node 0, pgdat c09be980, node_mem_map cfce7000
> [    0.000000]   Normal zone: 512 pages used for memmap
> [    0.000000]   Normal zone: 0 pages reserved
> [    0.000000]   Normal zone: 65280 pages, LIFO batch:15
> [    0.000000]   HighMem zone: 1048574 pages exceeds freesize 0
> [    0.000000] CPU: All CPU(s) started in SVC mode.
> [    0.000000] AM3517 ES1.1 (l2cache sgx neon )
> [    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
> [    0.000000] pcpu-alloc: [0] 0
> [    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 64768
> [    0.000000] Kernel command line: console=ttyO2,115200
> [    0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
> [    0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
> [    0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
> [    0.000000] Memory: 239940K/261120K available (4809K kernel code, 341K rwdata, 1816K rodata, 2996K init, 353K bss, 21180K reserved, 0K highmem)
> [    0.000000] Virtual kernel memory layout:
> [    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
> [    0.000000]     fixmap  : 0xffc00000 - 0xffe00000   (2048 kB)
> [    0.000000]     vmalloc : 0xd0800000 - 0xff000000   ( 744 MB)
> [    0.000000]     lowmem  : 0xc0000000 - 0xd0000000   ( 256 MB)
> [    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
> [    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
> [    0.000000]       .text : 0xc0008000 - 0xc0680984   (6627 kB)
> [    0.000000]       .init : 0xc0681000 - 0xc096e000   (2996 kB)
> [    0.000000]       .data : 0xc096e000 - 0xc09c354c   ( 342 kB)
> [    0.000000]        .bss : 0xc09c354c - 0xc0a1b97c   ( 354 kB)
> [    0.000000] Preemptible hierarchical RCU implementation.
> [    0.000000] NR_IRQS:16 nr_irqs:16 16
> [    0.000000] IRQ: Found an INTC at 0xfa200000 (revision 4.0) with 96 interrupts
> [    0.000000] Clocking rate (Crystal/Core/MPU): 26.0/332/600 MHz
> [    0.000000] OMAP clockevent source: timer2 at 13000000 Hz
> [    0.000023] sched_clock: 32 bits at 13MHz, resolution 76ns, wraps every 330382100403ns
> [    0.000058] OMAP clocksource: timer1 at 13000000 Hz
> [    0.000598] Console: colour dummy device 80x30
> [    0.000635] Calibrating delay loop... 589.82 BogoMIPS (lpj=294912)
> [    0.008980] pid_max: default: 32768 minimum: 301
> [    0.009168] Security Framework initialized
> [    0.009264] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
> [    0.009282] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
> [    0.010313] CPU: Testing write buffer coherency: ok
> [    0.010936] Setting up static identity map for 0x80496c78 - 0x80496cd0
> [    0.013878] devtmpfs: initialized
> [    0.016530] VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 1
> [    0.038120] omap_hwmod: mcbsp2_sidetone using broken dt data from mcbsp
> [    0.038751] omap_hwmod: mcbsp3_sidetone using broken dt data from mcbsp
> [    0.082753] omap_hwmod: mcbsp2: cannot be enabled for reset (3)
> [    0.099153] pinctrl core: initialized pinctrl subsystem
> [    0.100179] regulator-dummy: no parameters
> [    0.134359] NET: Registered protocol family 16
> [    0.137058] DMA: preallocated 256 KiB pool for atomic coherent allocations
> [    0.146611] Reprogramming SDRC clock to 332000000 Hz
> [    0.149695] platform 480c5000.aes: Cannot lookup hwmod 'aes'
> [    0.156050] OMAP GPIO hardware version 2.5
> [    0.173473] platform 480c3000.sham: Cannot lookup hwmod 'sham'
> [    0.174042] platform 480cb000.smartreflex: Cannot lookup hwmod 'smartreflex_core'
> [    0.181773] omap-gpmc 6e000000.gpmc: GPMC revision 5.0
> [    0.182409] platform 480ab000.usb_otg_hs: Cannot lookup hwmod 'usb_otg_hs'
> [    0.185485] No ATAGs?
> [    0.185526] hw-breakpoint: debug architecture 0x4 unsupported.
> [    0.187801] OMAP DMA hardware revision 4.0
> [    0.248481] omap-dma-engine 48056000.dma-controller: OMAP DMA engine driver
> [    0.249924] vmmc_fixed: 3300 mV
> [    0.251923] SCSI subsystem initialized
> [    0.252848] usbcore: registered new interface driver usbfs
> [    0.253127] usbcore: registered new interface driver hub
> [    0.253330] usbcore: registered new device driver usb
> [    0.255867] omap_i2c 48070000.i2c: bus 0 rev3.3 at 400 kHz
> [    0.257215] omap_i2c 48072000.i2c: bus 1 rev3.3 at 400 kHz
> [    0.258330] omap_i2c 48060000.i2c: bus 2 rev3.3 at 400 kHz
> [    0.260815] Switched to clocksource timer1
> [    0.340661] NET: Registered protocol family 2
> [    0.342429] TCP established hash table entries: 2048 (order: 1, 8192 bytes)
> [    0.342506] TCP bind hash table entries: 2048 (order: 3, 40960 bytes)
> [    0.342604] TCP: Hash tables configured (established 2048 bind 2048)
> [    0.342743] TCP: reno registered
> [    0.342768] UDP hash table entries: 256 (order: 1, 12288 bytes)
> [    0.342879] UDP-Lite hash table entries: 256 (order: 1, 12288 bytes)
> [    0.343204] NET: Registered protocol family 1
> [    0.861358] hw perfevents: enabled with armv7_cortex_a8 PMU driver, 5 counters available
> [    0.867219] futex hash table entries: 256 (order: 0, 7168 bytes)
> [    0.870487] VFS: Disk quotas dquot_6.5.2
> [    0.870589] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
> [    0.871381] msgmni has been set to 484
> [    0.874913] io scheduler noop registered
> [    0.874948] io scheduler deadline registered
> [    0.875029] io scheduler cfq registered (default)
> [    0.877145] pinctrl-single 48002030.pinmux: 284 pins at pa fa002030 size 568
> [    0.877537] pinctrl-single 48002a00.pinmux: 46 pins at pa fa002a00 size 92
> [    0.880571] omap_uart 4806a000.serial: no wakeirq for uart0
> [    0.881110] 4806a000.serial: ttyO0 at MMIO 0x4806a000 (irq = 88, base_baud = 3000000) is a OMAP UART0
> [    0.882028] omap_uart 4806c000.serial: no wakeirq for uart0
> [    0.882573] 4806c000.serial: ttyO1 at MMIO 0x4806c000 (irq = 89, base_baud = 3000000) is a OMAP UART1
> [    0.883521] omap_uart 49020000.serial: no wakeirq for uart0
> [    0.883691] 49020000.serial: ttyO2 at MMIO 0x49020000 (irq = 90, base_baud = 3000000) is a OMAP UART2
> [    1.469044] console [ttyO2] enabled
> [    1.492339] brd: module loaded
> [    1.498629] mtdoops: mtd device (mtddev=name/number) must be supplied
> [    1.508182] usbcore: registered new interface driver asix
> [    1.514672] usbcore: registered new interface driver ax88179_178a
> [    1.522285] usbcore: registered new interface driver cdc_ether
> [    1.529444] usbcore: registered new interface driver smsc95xx
> [    1.536463] usbcore: registered new interface driver net1080
> [    1.543372] usbcore: registered new interface driver cdc_subset
> [    1.550618] usbcore: registered new interface driver cdc_ncm
> [    1.561182] omap_wdt: OMAP Watchdog Timer Rev 0x31: initial timeout 60 sec
> [    1.595009] usbcore: registered new interface driver usbhid
> [    1.601583] usbhid: USB HID core driver
> [    1.607206] oprofile: using arm/armv7
> [    1.611987] nf_conntrack version 0.5.0 (3877 buckets, 15508 max)
> [    1.619512] TCP: cubic registered
> [    1.623127] Initializing XFRM netlink socket
> [    1.627898] NET: Registered protocol family 17
> [    1.632751] NET: Registered protocol family 15
> [    1.637616] Key type dns_resolver registered
> [    1.642382] omap2_set_init_voltage: unable to find boot up OPP for vdd_mpu_iva
> [    1.650025] omap2_set_init_voltage: unable to set vdd_mpu_iva
> [    1.656119] omap2_set_init_voltage: unable to find boot up OPP for vdd_core
> [    1.663479] omap2_set_init_voltage: unable to set vdd_core
> [    1.670110] PM: no software I/O chain control; some wakeups may be lost
> [    1.677499] pm: Failed to request pm_wkup irq
> [    1.682230] ThumbEE CPU extension supported.
> [    1.686920] Registering SWP/SWPB emulation handler
> [    1.697176] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
> [    1.705634] mmc0: host does not support reading read-only switch, assuming write-enable
> [    1.721911] mmc0: new high speed SDHC card at address 0002
> [    1.737955] mmcblk0: mmc0:0002       3.81 GiB
> [    1.748383]  mmcblk0: p1 p2 p3
> [    1.756622] Warning: unable to open an initial console.
> [    1.772351] Freeing unused kernel memory: 2996K (c0681000 - c096e000)
> [    2.651221] udevd[643]: starting version 182
> [    4.101678] random: dd urandom read with 51 bits of entropy available
> [   15.397932] random: nonblocking pool is initialized
> [  382.789857] perf interrupt took too long (2535 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> [  755.387860] perf interrupt took too long (5004 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
> [ 4675.751682] ------------[ cut here ]------------
> [ 4675.814115] WARNING: CPU: 0 PID: 27573 at mm/rmap.c:226 unlink_anon_vmas+0x20c/0x21c()
> [ 4675.895950] Modules linked in:
> [ 4675.927371] CPU: 0 PID: 27573 Comm: stress-ng-fork Not tainted 3.18.24 #2
> [ 4676.007080] [<c00145b4>] (unwind_backtrace) from [<c0011e68>] (show_stack+0x10/0x14)
> [ 4676.089059] [<c0011e68>] (show_stack) from [<c0035824>] (warn_slowpath_common+0x70/0x88)
> [ 4676.172027] [<c0035824>] (warn_slowpath_common) from [<c00358d8>] (warn_slowpath_null+0x1c/0x24)
> [ 4676.266028] [<c00358d8>] (warn_slowpath_null) from [<c00ef6b8>] (unlink_anon_vmas+0x20c/0x21c)
> [ 4676.358081] [<c00ef6b8>] (unlink_anon_vmas) from [<c00e4658>] (free_pgtables+0x78/0xcc)
> [ 4676.441074] [<c00e4658>] (free_pgtables) from [<c00ec624>] (exit_mmap+0xf0/0x230)
> [ 4676.521016] [<c00ec624>] (exit_mmap) from [<c003313c>] (mmput+0x50/0xec)
> [ 4676.593103] [<c003313c>] (mmput) from [<c0036434>] (do_exit+0x25c/0x9d0)
> [ 4676.665045] [<c0036434>] (do_exit) from [<c00379e8>] (do_group_exit+0x3c/0xb0)
> [ 4676.741161] [<c00379e8>] (do_group_exit) from [<c0037a6c>] (__wake_up_parent+0x0/0x18)
> [ 4676.824005] ---[ end trace 216df8b29a401aa4 ]---
> [ 4676.875157] ------------[ cut here ]------------
> [ 4676.880036] kernel BUG at mm/rmap.c:406!
> [ 4676.884144] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
> [ 4676.889889] Modules linked in:
> [ 4676.893107] CPU: 0 PID: 27573 Comm: stress-ng-fork Tainted: G        W      3.18.24 #2
> [ 4676.901400] task: cf220c80 ti: ce072000 task.ti: ce072000
> [ 4676.907077] PC is at unlink_anon_vmas+0x1dc/0x21c
> [ 4676.912007] LR is at unlink_anon_vmas+0x104/0x21c
> [ 4676.916935] pc : [<c00ef688>]    lr : [<c00ef5b0>]    psr: 200c0013
> [ 4676.916935] sp : ce073e80  ip : 00000000  fp : c09c13c6
> [ 4676.928949] r10: ce19c8c8  r9 : ce19c8fc  r8 : ce19c904
> [ 4676.934419] r7 : ce0780e8  r6 : ce1ceaa0  r5 : c09fc620  r4 : ce1ceaa0
> [ 4676.941251] r3 : 00000004  r2 : ffff0001  r1 : 00000000  r0 : ce0eb568
> [ 4676.948086] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [ 4676.955556] Control: 10c5387d  Table: 8e100019  DAC: 00000015
> [ 4676.961571] Process stress-ng-fork (pid: 27573, stack limit = 0xce072238)
> [ 4676.968677] Stack: (0xce073e80 to 0xce074000)
> [ 4676.973246] 3e80: 00000000 ce0eb568 cf2b893c ce19c8c8 ce1c7768 4a5c8000 ce073ed8 00002000
> [ 4676.981811] 3ea0: 00000000 ce068040 ce068084 c00e4658 4a5c8000 c00e6538 00000000 ce183c90
> [ 4676.990376] 3ec0: ce073f00 ce068040 000000f8 c000e8a4 00000001 c00ec624 ce068040 00000001
> [ 4676.998940] 3ee0: 00000000 00000000 ffffffff b6f5a070 ffffffec 000000c1 00000400 ce175000
> [ 4677.007505] 3f00: c0994c78 cf220c80 cf220c80 ce072008 000000f8 ce068040 00000000 ce072008
> [ 4677.016069] 3f20: 000000f8 ce068040 00000000 ce072008 000000f8 c003313c cf221104 cf220c80
> [ 4677.024634] 3f40: ce072008 c0036434 be9d6ea4 c0068a90 cf006940 cf00699c ce072030 00000036
> [ 4677.033198] 3f60: c09a2d94 00000000 ce072000 ce1d2800 000000f8 c000e8a4 ce072000 00000000
> [ 4677.041762] 3f80: 0005bb68 c00379e8 00000000 00000000 0005bb58 000000f8 c000e8a4 c0037a6c
> [ 4677.050326] 3fa0: 00000000 c000e720 00000000 00000000 00000000 00000000 00000000 4a72c468
> [ 4677.058890] 3fc0: 00000000 00000000 0005bb58 000000f8 00000001 00000000 be9d6ed0 0005bb68
> [ 4677.067455] 3fe0: 4a695e80 be9d6ea4 0001b21c 4a695e90 60060010 00000000 00000000 00000000
> [ 4677.076039] [<c00ef688>] (unlink_anon_vmas) from [<c00e4658>] (free_pgtables+0x78/0xcc)
> [ 4677.084430] [<c00e4658>] (free_pgtables) from [<c00ec624>] (exit_mmap+0xf0/0x230)
> [ 4677.092275] [<c00ec624>] (exit_mmap) from [<c003313c>] (mmput+0x50/0xec)
> [ 4677.099302] [<c003313c>] (mmput) from [<c0036434>] (do_exit+0x25c/0x9d0)
> [ 4677.106326] [<c0036434>] (do_exit) from [<c00379e8>] (do_group_exit+0x3c/0xb0)
> [ 4677.113895] [<c00379e8>] (do_group_exit) from [<c0037a6c>] (__wake_up_parent+0x0/0x18)
> [ 4677.122189] Code: 0a000009 e2820004 ebfdc186 eaffffb2 (e7f001f2)
> [ 4677.128597] ---[ end trace 216df8b29a401aa5 ]---
> [ 4677.133435] Kernel panic - not syncing: Fatal exception
> [ 4677.138911] ---[ end Kernel panic - not syncing: Fatal exception
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



More information about the linux-arm-kernel mailing list