Crash when booting RPi2 / bcm2836 with ftrace=function

Matthias Reichl hias at horus.com
Wed Aug 23 08:42:14 PDT 2017


On Wed, Aug 23, 2017 at 10:29:31AM -0400, Steven Rostedt wrote:
> On Mon, 21 Aug 2017 17:34:02 +0200
> Matthias Reichl <hias at horus.com> wrote:
> 
> > When I enable the function tracer during boot on Raspberry Pi 2
> > /bcm2836) I get a kernel crash very early in the boot process:
> 
> This is because I moved up the start time of function tracing to allow
> to trace earlier in the boot process.
> 
> See commit f631718de3ca "ftrace: Move ftrace_init() to right after
> memory initialization"
> 
> And commit e725c731e3b "tracing: Split tracing initialization into two
> for early initialization" because people wanted to trace earlier in boot up.
> 
> I was nervous that some archs may take issue with this.
> 
> > 
> > [    0.000000] Unable to handle kernel paging request at virtual address c11d00e8
> > [    0.000000] pgd = c0204000
> > [    0.000000] [c11d00e8] *pgd=0101940e(bad)
> > [    0.000000] Internal error: Oops: 80d [#1] SMP ARM
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.8 #1
> > [    0.000000] Hardware name: BCM2835
> > [    0.000000] task: c1608100 task.stack: c1600000
> > [    0.000000] PC is at smp_set_ops+0x2c/0x3c
> > [    0.000000] LR is at 0xc11d00e8
> > [    0.000000] pc : [<c14058c4>]    lr : [<c11d00e8>]    psr: a00000d3
> > [    0.000000] sp : c1601f08  ip : c153af40  fp : c1601f14
> > [    0.000000] r10: c1601f48  r9 : 00000200  r8 : 00000100
> > [    0.000000] r7 : c1601f50  r6 : ea9f7660  r5 : 00000000  r4 : c16043bc
> > [    0.000000] r3 : c14451c4  r2 : 00000000  r1 : 00000000  r0 : 00000000
> > [    0.000000] Flags: NzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> > [    0.000000] Control: 10c5387d  Table: 0020406a  DAC: 00000051
> > [    0.000000] Process swapper/0 (pid: 0, stack limit = 0xc1600220)
> > [    0.000000] Stack: (0xc1601f08 to 0xc1602000)
> > [    0.000000] 1f00:                   c1601f3c c1601f18 c1445374 c14058a4 c068d488 00000000
> > [    0.000000] 1f20: ea9f7660 ea244300 c1601f48 00000000 c1601f84 c1601f40 c14b78e8 c1445220
> > [    0.000000] 1f40: 00000010 00000000 ea2442c0 ea2442c0 c1601f50 c1601f50 c065d450 c14dca20
> > [    0.000000] 1f60: c1780cc0 c1603c40 ffffffff c1780cc0 c14dca30 eafff940 c1601f94 c1601f88
> > [    0.000000] 1f80: c1444fac c14b7780 c1601fac c1601f98 c1403994 c1444f98 00000000 00000000
> > [    0.000000] 1fa0: c1601ff4 c1601fb0 c1400c54 c140396c ffffffff ffffffff 00000000 c1400750
> > [    0.000000] 1fc0: 00000000 c14dca30 00000000 c1781294 c1603c58 c14dca2c c16094e4 0020406a
> > [    0.000000] 1fe0: 410fc075 00000000 00000000 c1601ff8 0020807c c1400a18 00000000 00000000
> > [    0.000000] [<c14058c4>] (smp_set_ops) from [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init+0x160/0x180)
> > [    0.000000] [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init) from [<c14b78e8>] (of_irq_init+0x174/0x2c4)
> > [    0.000000] [<c14b78e8>] (of_irq_init) from [<c1444fac>] (irqchip_init+0x20/0x24)
> > [    0.000000] [<c1444fac>] (irqchip_init) from [<c1403994>] (init_IRQ+0x34/0x98)
> > [    0.000000] [<c1403994>] (init_IRQ) from [<c1400c54>] (start_kernel+0x248/0x3a8)
> > [    0.000000] [<c1400c54>] (start_kernel) from [<0020807c>] (0x20807c)
> > [    0.000000] Code: e1a0c000 e300e0e8 e8bc000f e34ce11d (e8ae000f) 
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
> > 
> > Full log is here: https://gist.github.com/HiassofT/45c4b03f4663a6d19e7a79cfbfd9c560
> > 
> > This was on 4.12.8, multi_v7_defconfig plus CONFIG_FUNCTION_TRACER
> > selected in menuconfig and ftrace_filter=__request_module ftrace=function
> > added to kernel command line.
> > 
> > The same crash also occurs on 4.13-rc5 and with only
> > ftrace=function on the kernel command line.
> > 
> > Enabling the function tracer at runtime via
> > echo function > /sys/kernel/tracing/current_tracer
> > works as expected.
> > 
> > With kernel 4.9.44 the crash is a little bit different:
> 
> Interesting, because my updates happened in 4.12.
> 
> > 
> > [    0.816852] Unable to handle kernel paging request at virtual address c11a4108
> > [    0.824490] pgd = c0204000
> > [    0.827382] [c11a4108] *pgd=0101940e(bad)
> > [    0.831714] Internal error: Oops: 80d [#1] SMP ARM
> > [    0.836605] Modules linked in:
> > [    0.839743] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.44 #1
> > [    0.845781] Hardware name: BCM2835
> > [    0.849257] task: ea288000 task.stack: ea284000
> > [    0.853899] PC is at vdso_init+0xbc/0x1b8
> 
> Interesting. vdso should not be traced, but vdso_init can be. What was
> the call that it was making? That is, have an objdump of these
> locations?

Here's the disassembly of vdso_init:

Dump of assembler code for function vdso_init:
   0xc13065f4 <+0>:	mov	r12, sp
   0xc13065f8 <+4>:	push	{r4, r5, r6, r7, r8, r11, r12, lr, pc}
   0xc13065fc <+8>:	sub	r11, r12, #4
   0xc1306600 <+12>:	sub	sp, sp, #20
   0xc1306604 <+16>:	movw	r4, #12288	; 0x3000
   0xc1306608 <+20>:	movt	r4, #49434	; 0xc11a
   0xc130660c <+24>:	movw	r1, #3676	; 0xe5c
   0xc1306610 <+28>:	movt	r1, #49416	; 0xc108
   0xc1306614 <+32>:	mov	r0, r4
   0xc1306618 <+36>:	mov	r2, #4
   0xc130661c <+40>:	mov	r5, r4
   0xc1306620 <+44>:	bl	0xc063935c <memcmp>
   0xc1306624 <+48>:	subs	r6, r0, #0
   0xc1306628 <+52>:	beq	0xc1306640 <vdso_init+76>
   0xc130662c <+56>:	movw	r0, #3684	; 0xe64
   0xc1306630 <+60>:	mvn	r6, #7
   0xc1306634 <+64>:	movt	r0, #49416	; 0xc108
   0xc1306638 <+68>:	bl	0xc04174d8 <printk>
   0xc130663c <+72>:	b	0xc130679c <vdso_init+424>
   0xc1306640 <+76>:	movw	r3, #16384	; 0x4000
   0xc1306644 <+80>:	movt	r3, #49434	; 0xc11a
   0xc1306648 <+84>:	rsb	r3, r4, r3
   0xc130664c <+88>:	asr	r4, r3, #12
   0xc1306650 <+92>:	cmn	r4, #-1073741823	; 0xc0000001
   0xc1306654 <+96>:	movhi	r12, r6
   0xc1306658 <+100>:	bhi	0xc1306670 <vdso_init+124>
   0xc130665c <+104>:	movw	r1, #32960	; 0x80c0
   0xc1306660 <+108>:	lsl	r0, r4, #2
   0xc1306664 <+112>:	movt	r1, #576	; 0x240
   0xc1306668 <+116>:	bl	0xc0465f48 <__kmalloc>
   0xc130666c <+120>:	mov	r12, r0
   0xc1306670 <+124>:	cmp	r12, #0
   0xc1306674 <+128>:	movw	r3, #2040	; 0x7f8
   0xc1306678 <+132>:	movt	r3, #49510	; 0xc166
   0xc130667c <+136>:	str	r12, [r3]
   0xc1306680 <+140>:	mvneq	r6, #11
   0xc1306684 <+144>:	beq	0xc130679c <vdso_init+424>
   0xc1306688 <+148>:	ldr	r0, [pc, #280]	; 0xc13067a8 <vdso_init+436>
   0xc130668c <+152>:	movw	r1, #14272	; 0x37c0
   0xc1306690 <+156>:	movt	r1, #49513	; 0xc169
   0xc1306694 <+160>:	movw	r3, #16648	; 0x4108
   0xc1306698 <+164>:	movt	r3, #49434	; 0xc11a
   0xc130669c <+168>:	ldr	r2, [r1]
   0xc13066a0 <+172>:	lsr	r0, r0, #12
   0xc13066a4 <+176>:	mov	r7, r3
   0xc13066a8 <+180>:	mov	lr, #0
   0xc13066ac <+184>:	add	r2, r2, r0, lsl #5
   0xc13066b0 <+188>:	str	r2, [r3]
   0xc13066b4 <+192>:	add	r3, lr, #262144	; 0x40000
   0xc13066b8 <+196>:	cmp	lr, r4
   0xc13066bc <+200>:	add	r3, r5, r3, lsl #12
   0xc13066c0 <+204>:	beq	0xc13066dc <vdso_init+232>
   0xc13066c4 <+208>:	ldr	r2, [r1]
   0xc13066c8 <+212>:	lsr	r3, r3, #12
   0xc13066cc <+216>:	add	r3, r2, r3, lsl #5
   0xc13066d0 <+220>:	str	r3, [r12, lr, lsl #2]
   0xc13066d4 <+224>:	add	lr, lr, #1
   0xc13066d8 <+228>:	b	0xc13066b4 <vdso_init+192>
   0xc13066dc <+232>:	mov	r0, #0
   0xc13066e0 <+236>:	movw	r2, #3720	; 0xe88
   0xc13066e4 <+240>:	mov	r1, r0
   0xc13066e8 <+244>:	movt	r2, #49416	; 0xc108
   0xc13066ec <+248>:	add	lr, lr, #1
   0xc13066f0 <+252>:	str	r12, [r7, #8]
   0xc13066f4 <+256>:	str	lr, [r7, #20]
   0xc13066f8 <+260>:	bl	0xc0bba534 <of_find_compatible_node>
   0xc13066fc <+264>:	subs	r8, r0, #0
   0xc1306700 <+268>:	moveq	r4, r8
   0xc1306704 <+272>:	beq	0xc1306720 <vdso_init+300>
   0xc1306708 <+276>:	movw	r1, #3736	; 0xe98
   0xc130670c <+280>:	mov	r2, #0
   0xc1306710 <+284>:	movt	r1, #49416	; 0xc108
   0xc1306714 <+288>:	bl	0xc0bb8b18 <of_find_property>
   0xc1306718 <+292>:	clz	r4, r0
   0xc130671c <+296>:	lsr	r4, r4, #5
   0xc1306720 <+300>:	mov	r0, r8
   0xc1306724 <+304>:	bl	0xc0bbcbb4 <of_node_put>
   0xc1306728 <+308>:	mov	r1, #0
   0xc130672c <+312>:	mov	r2, #16
   0xc1306730 <+316>:	sub	r0, r11, #52	; 0x34
   0xc1306734 <+320>:	strb	r4, [r7, #24]
   0xc1306738 <+324>:	bl	0xc0631320 <mmioset>
   0xc130673c <+328>:	movw	r0, #12288	; 0x3000
   0xc1306740 <+332>:	movw	r1, #3772	; 0xebc
   0xc1306744 <+336>:	movt	r1, #49416	; 0xc108
   0xc1306748 <+340>:	sub	r2, r11, #44	; 0x2c
   0xc130674c <+344>:	movt	r0, #49434	; 0xc11a
   0xc1306750 <+348>:	str	r5, [r11, #-52]	; 0xffffffcc
   0xc1306754 <+352>:	bl	0xc13064c8 <find_section>
   0xc1306758 <+356>:	movw	r1, #3780	; 0xec4
   0xc130675c <+360>:	movt	r1, #49416	; 0xc108
   0xc1306760 <+364>:	mov	r2, #0
   0xc1306764 <+368>:	str	r0, [r11, #-48]	; 0xffffffd0
   0xc1306768 <+372>:	ldr	r0, [r11, #-52]	; 0xffffffcc
   0xc130676c <+376>:	bl	0xc13064c8 <find_section>
   0xc1306770 <+380>:	cmp	r4, #0
   0xc1306774 <+384>:	str	r0, [r11, #-40]	; 0xffffffd8
   0xc1306778 <+388>:	bne	0xc130679c <vdso_init+424>
   0xc130677c <+392>:	movw	r1, #3788	; 0xecc
   0xc1306780 <+396>:	sub	r0, r11, #52	; 0x34
   0xc1306784 <+400>:	movt	r1, #49416	; 0xc108
   0xc1306788 <+404>:	bl	0xc1306560 <vdso_nullpatch_one>
   0xc130678c <+408>:	movw	r1, #3808	; 0xee0
   0xc1306790 <+412>:	sub	r0, r11, #52	; 0x34
   0xc1306794 <+416>:	movt	r1, #49416	; 0xc108
   0xc1306798 <+420>:	bl	0xc1306560 <vdso_nullpatch_one>
   0xc130679c <+424>:	mov	r0, r6
   0xc13067a0 <+428>:	sub	sp, r11, #32
   0xc13067a4 <+432>:	ldm	sp, {r4, r5, r6, r7, r8, r11, sp, pc}
   0xc13067a8 <+436>:	msreq	SPSR_sc, r0
End of assembler dump.

Offset 0xbc (188 decimal) is a store.

I haven't verified it myself but Phil Elwells first analysis here
sounds plausible

https://github.com/raspberrypi/linux/issues/2166#issuecomment-323355145

> I don't have a complete answer yet, but I can tell you that the problem
> is caused by writes to variables that are declared as __ro_after_init.
> The first failure is vdso_data_page being written from vdso_init, but
> if you remove the attribute then the fault moves to the next write
> that fits the pattern (vdso_text_mapping, cntvct_ok), eventually
> appearing in a different function altogether (ptmx_fops, written from
> tty_default_fops).
> 
> What makes this interesting (and is the reason this doesn't fail
> normally) is that these functions are all called within kernel_init, so
> the __ro_after_init attribute shouldn't have caused the variable to be
> read-only yet.

Drop me a line if you want me to do some more testing.

so long & thanks for looking into it,

Hias

> 
> -- Steve
> 
> 
> > [    0.857996] LR is at 0x0
> > [    0.860590] pc : [<c13066b0>]    lr : [<00000000>]    psr: a0000013
> > [    0.860590] sp : ea285ea0  ip : ea757340  fp : ea285ed4
> > [    0.872297] r10: 00000000  r9 : c130060c  r8 : 00000135
> > [    0.877630] r7 : c11a4108  r6 : 00000000  r5 : c11a3000  r4 : 00000001
> > [    0.884290] r3 : c11a4108  r2 : eaa25be0  r1 : c16937c0  r0 : 0000165f
> > [    0.890952] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> > [    0.898231] Control: 10c5387d  Table: 0020406a  DAC: 00000051
> > [    0.904094] Process swapper/0 (pid: 1, stack limit = 0xea284220)
> > [    0.910224] Stack: (0xea285ea0 to 0xea286000)
> > [    0.914681] 5ea0: 00000003 c1660000 c1660000 00000135 ea285f54 c1465bf4 c13065f4 c1660000
> > [    0.923028] 5ec0: c1660000 00000135 ea285f54 ea285ed8 c0302390 c1306600 c03121e0 c03e4f48
> > [    0.931376] 5ee0: ea285f0c ea285ef0 c1300600 c0638fd4 00000000 c11f11a0 c0e25c00 00000135
> > [    0.939723] 5f00: ea285f54 ea285f10 c036c424 c1300618 00000003 00000003 00000000 c11ef97c
> > [    0.948071] 5f20: c107da58 00000000 c1660000 c1465bf4 00000003 c1660000 c1660000 00000135
> > [    0.956418] 5f40: c130060c c13d1828 ea285f94 ea285f58 c1300e74 c030234c 00000003 00000003
> > [    0.964765] 5f60: 00000000 c130060c aaaaaaaf 00000000 c0d1d074 00000000 00000000 00000000
> > [    0.973113] 5f80: 00000000 00000000 ea285fac ea285f98 c0d1d08c c1300d1c ea284000 00000000
> > [    0.981460] 5fa0: 00000000 ea285fb0 c0308af8 c0d1d080 00000000 00000000 00000000 00000000
> > [    0.989806] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [    0.998153] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 aabaaaba eaaaaaaa
> > [    1.006531] [<c13066b0>] (vdso_init) from [<c0302390>] (do_one_initcall+0x50/0x180)
> > [    1.014365] [<c0302390>] (do_one_initcall) from [<c1300e74>] (kernel_init_freeable+0x164/0x204)
> > [    1.023260] [<c1300e74>] (kernel_init_freeable) from [<c0d1d08c>] (kernel_init+0x18/0x124)
> > [    1.031713] [<c0d1d08c>] (kernel_init) from [<c0308af8>] (ret_from_fork+0x14/0x3c)
> > [    1.039444] Code: e1a00620 e1a07003 e3a0e000 e0822280 (e5832000) 
> > [    1.045706] ---[ end trace f660ee60085b3a9f ]---
> > [    1.050670] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [    1.050670] 
> > [    1.060016] CPU2: stopping
> > [    1.062793] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D         4.9.44 #1
> > [    1.070070] Hardware name: BCM2835
> > [    1.073576] [<c03133dc>] (unwind_backtrace) from [<c030d610>] (show_stack+0x20/0x24)
> > [    1.081498] [<c030d610>] (show_stack) from [<c063220c>] (dump_stack+0xa0/0xb4)
> > [    1.088889] [<c063220c>] (dump_stack) from [<c031118c>] (handle_IPI+0x2e0/0x300)
> > [    1.096456] [<c031118c>] (handle_IPI) from [<c03015a0>] (bcm2836_arm_irqchip_handle_irq+0x84/0xb4)
> > [    1.105613] [<c03015a0>] (bcm2836_arm_irqchip_handle_irq) from [<c030e2cc>] (__irq_svc+0x6c/0x90)
> > [    1.114662] Exception stack(0xea2bdf48 to 0xea2bdf90)
> > [    1.119823] df40:                   00000001 00000000 00000000 c031f960 ea2bc000 c1503034
> > [    1.128170] df60: c1503098 00000004 00000000 00000000 c15030a0 ea2bdfa4 c0309598 ea2bdf98
> > [    1.136513] df80: c03095c8 c03095cc 60000013 ffffffff
> > [    1.141690] [<c030e2cc>] (__irq_svc) from [<c03095cc>] (arch_cpu_idle+0x48/0x4c)
> > [    1.149256] [<c03095cc>] (arch_cpu_idle) from [<c0d23060>] (default_idle_call+0x30/0x3c)
> > [    1.157529] [<c0d23060>] (default_idle_call) from [<c038fc30>] (cpu_startup_entry+0x1ac/0x230)
> > [    1.166334] [<c038fc30>] (cpu_startup_entry) from [<c0310c5c>] (secondary_start_kernel+0x128/0x134)
> > [    1.175577] [<c0310c5c>] (secondary_start_kernel) from [<0030210c>] (0x30210c)
> > [ 1.182949] CPU3: stopping
> > 
> > Full trace is here: https://gist.github.com/HiassofT/03b3685b53ab67a0b3e8eb9cf8f91ccd
> > 
> > I've got similar crashes on the downstream RPi kernels 4.9 and 4.12
> > and Phil Elwell posted his analysis of the 4.9 problem in the github
> > issue https://github.com/raspberrypi/linux/issues/2166
> > 
> > > The mark_readonly and mark_rodata_ro are not being called early (and
> > > adding rodata=n to the command line doesn't fix it), so it must be that
> > > the default protection state is being set to read-only.
> > > 
> > > Digging deeper I've narrowed it down to the call to set_kernel_text_ro
> > > from the __ftrace_modify_code function in arch/arm/kernel/ftrace.c. It
> > > uses the same mechanism and section data as mark_rodata_ro. My theory
> > > is:
> > > 
> > > 1. Data which is marked __ro_after_init is initially writeable.
> > > 2. The ro_perms data covers kernel text, read-only data and
> > >    __ro_after_init data.
> > > 3. set_kernel_text_rw marks everything in ro_perms as writeable.
> > > 4. set_kernel_text_ro marks everything in ro_perms as read-only,
> > >    including the __ro_after_init data.
> > > 5. Using the function tracing code involves code modification,
> > >    resulting in calls to __ftrace_modify_code and set_kernel_text_ro.
> > > 6. Therefore if function tracing is enabled before kernel_init has
> > >    completed then the __ro_after_init data is made read-only prematurely.
> > > 
> > > This doesn't look like a Pi-specific bug/feature. A fix would require
> > > separating out the section permission data for the __ro_after_init,
> > > updating the mark_rodata_ro to use the new permissions data instead.  
> > 
> > so long,
> > 
> > Hias
> 



More information about the linux-rpi-kernel mailing list