Crash when booting RPi2 / bcm2836 with ftrace=function
Matthias Reichl
hias at horus.com
Wed Aug 23 08:42:14 PDT 2017
On Wed, Aug 23, 2017 at 10:29:31AM -0400, Steven Rostedt wrote:
> On Mon, 21 Aug 2017 17:34:02 +0200
> Matthias Reichl <hias at horus.com> wrote:
>
> > When I enable the function tracer during boot on Raspberry Pi 2
> > /bcm2836) I get a kernel crash very early in the boot process:
>
> This is because I moved up the start time of function tracing to allow
> to trace earlier in the boot process.
>
> See commit f631718de3ca "ftrace: Move ftrace_init() to right after
> memory initialization"
>
> And commit e725c731e3b "tracing: Split tracing initialization into two
> for early initialization" because people wanted to trace earlier in boot up.
>
> I was nervous that some archs may take issue with this.
>
> >
> > [ 0.000000] Unable to handle kernel paging request at virtual address c11d00e8
> > [ 0.000000] pgd = c0204000
> > [ 0.000000] [c11d00e8] *pgd=0101940e(bad)
> > [ 0.000000] Internal error: Oops: 80d [#1] SMP ARM
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.8 #1
> > [ 0.000000] Hardware name: BCM2835
> > [ 0.000000] task: c1608100 task.stack: c1600000
> > [ 0.000000] PC is at smp_set_ops+0x2c/0x3c
> > [ 0.000000] LR is at 0xc11d00e8
> > [ 0.000000] pc : [<c14058c4>] lr : [<c11d00e8>] psr: a00000d3
> > [ 0.000000] sp : c1601f08 ip : c153af40 fp : c1601f14
> > [ 0.000000] r10: c1601f48 r9 : 00000200 r8 : 00000100
> > [ 0.000000] r7 : c1601f50 r6 : ea9f7660 r5 : 00000000 r4 : c16043bc
> > [ 0.000000] r3 : c14451c4 r2 : 00000000 r1 : 00000000 r0 : 00000000
> > [ 0.000000] Flags: NzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none
> > [ 0.000000] Control: 10c5387d Table: 0020406a DAC: 00000051
> > [ 0.000000] Process swapper/0 (pid: 0, stack limit = 0xc1600220)
> > [ 0.000000] Stack: (0xc1601f08 to 0xc1602000)
> > [ 0.000000] 1f00: c1601f3c c1601f18 c1445374 c14058a4 c068d488 00000000
> > [ 0.000000] 1f20: ea9f7660 ea244300 c1601f48 00000000 c1601f84 c1601f40 c14b78e8 c1445220
> > [ 0.000000] 1f40: 00000010 00000000 ea2442c0 ea2442c0 c1601f50 c1601f50 c065d450 c14dca20
> > [ 0.000000] 1f60: c1780cc0 c1603c40 ffffffff c1780cc0 c14dca30 eafff940 c1601f94 c1601f88
> > [ 0.000000] 1f80: c1444fac c14b7780 c1601fac c1601f98 c1403994 c1444f98 00000000 00000000
> > [ 0.000000] 1fa0: c1601ff4 c1601fb0 c1400c54 c140396c ffffffff ffffffff 00000000 c1400750
> > [ 0.000000] 1fc0: 00000000 c14dca30 00000000 c1781294 c1603c58 c14dca2c c16094e4 0020406a
> > [ 0.000000] 1fe0: 410fc075 00000000 00000000 c1601ff8 0020807c c1400a18 00000000 00000000
> > [ 0.000000] [<c14058c4>] (smp_set_ops) from [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init+0x160/0x180)
> > [ 0.000000] [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init) from [<c14b78e8>] (of_irq_init+0x174/0x2c4)
> > [ 0.000000] [<c14b78e8>] (of_irq_init) from [<c1444fac>] (irqchip_init+0x20/0x24)
> > [ 0.000000] [<c1444fac>] (irqchip_init) from [<c1403994>] (init_IRQ+0x34/0x98)
> > [ 0.000000] [<c1403994>] (init_IRQ) from [<c1400c54>] (start_kernel+0x248/0x3a8)
> > [ 0.000000] [<c1400c54>] (start_kernel) from [<0020807c>] (0x20807c)
> > [ 0.000000] Code: e1a0c000 e300e0e8 e8bc000f e34ce11d (e8ae000f)
> > [ 0.000000] ---[ end trace 0000000000000000 ]---
> > [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
> >
> > Full log is here: https://gist.github.com/HiassofT/45c4b03f4663a6d19e7a79cfbfd9c560
> >
> > This was on 4.12.8, multi_v7_defconfig plus CONFIG_FUNCTION_TRACER
> > selected in menuconfig and ftrace_filter=__request_module ftrace=function
> > added to kernel command line.
> >
> > The same crash also occurs on 4.13-rc5 and with only
> > ftrace=function on the kernel command line.
> >
> > Enabling the function tracer at runtime via
> > echo function > /sys/kernel/tracing/current_tracer
> > works as expected.
> >
> > With kernel 4.9.44 the crash is a little bit different:
>
> Interesting, because my updates happened in 4.12.
>
> >
> > [ 0.816852] Unable to handle kernel paging request at virtual address c11a4108
> > [ 0.824490] pgd = c0204000
> > [ 0.827382] [c11a4108] *pgd=0101940e(bad)
> > [ 0.831714] Internal error: Oops: 80d [#1] SMP ARM
> > [ 0.836605] Modules linked in:
> > [ 0.839743] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.44 #1
> > [ 0.845781] Hardware name: BCM2835
> > [ 0.849257] task: ea288000 task.stack: ea284000
> > [ 0.853899] PC is at vdso_init+0xbc/0x1b8
>
> Interesting. vdso should not be traced, but vdso_init can be. What was
> the call that it was making? That is, have an objdump of these
> locations?
Here's the disassembly of vdso_init:
Dump of assembler code for function vdso_init:
0xc13065f4 <+0>: mov r12, sp
0xc13065f8 <+4>: push {r4, r5, r6, r7, r8, r11, r12, lr, pc}
0xc13065fc <+8>: sub r11, r12, #4
0xc1306600 <+12>: sub sp, sp, #20
0xc1306604 <+16>: movw r4, #12288 ; 0x3000
0xc1306608 <+20>: movt r4, #49434 ; 0xc11a
0xc130660c <+24>: movw r1, #3676 ; 0xe5c
0xc1306610 <+28>: movt r1, #49416 ; 0xc108
0xc1306614 <+32>: mov r0, r4
0xc1306618 <+36>: mov r2, #4
0xc130661c <+40>: mov r5, r4
0xc1306620 <+44>: bl 0xc063935c <memcmp>
0xc1306624 <+48>: subs r6, r0, #0
0xc1306628 <+52>: beq 0xc1306640 <vdso_init+76>
0xc130662c <+56>: movw r0, #3684 ; 0xe64
0xc1306630 <+60>: mvn r6, #7
0xc1306634 <+64>: movt r0, #49416 ; 0xc108
0xc1306638 <+68>: bl 0xc04174d8 <printk>
0xc130663c <+72>: b 0xc130679c <vdso_init+424>
0xc1306640 <+76>: movw r3, #16384 ; 0x4000
0xc1306644 <+80>: movt r3, #49434 ; 0xc11a
0xc1306648 <+84>: rsb r3, r4, r3
0xc130664c <+88>: asr r4, r3, #12
0xc1306650 <+92>: cmn r4, #-1073741823 ; 0xc0000001
0xc1306654 <+96>: movhi r12, r6
0xc1306658 <+100>: bhi 0xc1306670 <vdso_init+124>
0xc130665c <+104>: movw r1, #32960 ; 0x80c0
0xc1306660 <+108>: lsl r0, r4, #2
0xc1306664 <+112>: movt r1, #576 ; 0x240
0xc1306668 <+116>: bl 0xc0465f48 <__kmalloc>
0xc130666c <+120>: mov r12, r0
0xc1306670 <+124>: cmp r12, #0
0xc1306674 <+128>: movw r3, #2040 ; 0x7f8
0xc1306678 <+132>: movt r3, #49510 ; 0xc166
0xc130667c <+136>: str r12, [r3]
0xc1306680 <+140>: mvneq r6, #11
0xc1306684 <+144>: beq 0xc130679c <vdso_init+424>
0xc1306688 <+148>: ldr r0, [pc, #280] ; 0xc13067a8 <vdso_init+436>
0xc130668c <+152>: movw r1, #14272 ; 0x37c0
0xc1306690 <+156>: movt r1, #49513 ; 0xc169
0xc1306694 <+160>: movw r3, #16648 ; 0x4108
0xc1306698 <+164>: movt r3, #49434 ; 0xc11a
0xc130669c <+168>: ldr r2, [r1]
0xc13066a0 <+172>: lsr r0, r0, #12
0xc13066a4 <+176>: mov r7, r3
0xc13066a8 <+180>: mov lr, #0
0xc13066ac <+184>: add r2, r2, r0, lsl #5
0xc13066b0 <+188>: str r2, [r3]
0xc13066b4 <+192>: add r3, lr, #262144 ; 0x40000
0xc13066b8 <+196>: cmp lr, r4
0xc13066bc <+200>: add r3, r5, r3, lsl #12
0xc13066c0 <+204>: beq 0xc13066dc <vdso_init+232>
0xc13066c4 <+208>: ldr r2, [r1]
0xc13066c8 <+212>: lsr r3, r3, #12
0xc13066cc <+216>: add r3, r2, r3, lsl #5
0xc13066d0 <+220>: str r3, [r12, lr, lsl #2]
0xc13066d4 <+224>: add lr, lr, #1
0xc13066d8 <+228>: b 0xc13066b4 <vdso_init+192>
0xc13066dc <+232>: mov r0, #0
0xc13066e0 <+236>: movw r2, #3720 ; 0xe88
0xc13066e4 <+240>: mov r1, r0
0xc13066e8 <+244>: movt r2, #49416 ; 0xc108
0xc13066ec <+248>: add lr, lr, #1
0xc13066f0 <+252>: str r12, [r7, #8]
0xc13066f4 <+256>: str lr, [r7, #20]
0xc13066f8 <+260>: bl 0xc0bba534 <of_find_compatible_node>
0xc13066fc <+264>: subs r8, r0, #0
0xc1306700 <+268>: moveq r4, r8
0xc1306704 <+272>: beq 0xc1306720 <vdso_init+300>
0xc1306708 <+276>: movw r1, #3736 ; 0xe98
0xc130670c <+280>: mov r2, #0
0xc1306710 <+284>: movt r1, #49416 ; 0xc108
0xc1306714 <+288>: bl 0xc0bb8b18 <of_find_property>
0xc1306718 <+292>: clz r4, r0
0xc130671c <+296>: lsr r4, r4, #5
0xc1306720 <+300>: mov r0, r8
0xc1306724 <+304>: bl 0xc0bbcbb4 <of_node_put>
0xc1306728 <+308>: mov r1, #0
0xc130672c <+312>: mov r2, #16
0xc1306730 <+316>: sub r0, r11, #52 ; 0x34
0xc1306734 <+320>: strb r4, [r7, #24]
0xc1306738 <+324>: bl 0xc0631320 <mmioset>
0xc130673c <+328>: movw r0, #12288 ; 0x3000
0xc1306740 <+332>: movw r1, #3772 ; 0xebc
0xc1306744 <+336>: movt r1, #49416 ; 0xc108
0xc1306748 <+340>: sub r2, r11, #44 ; 0x2c
0xc130674c <+344>: movt r0, #49434 ; 0xc11a
0xc1306750 <+348>: str r5, [r11, #-52] ; 0xffffffcc
0xc1306754 <+352>: bl 0xc13064c8 <find_section>
0xc1306758 <+356>: movw r1, #3780 ; 0xec4
0xc130675c <+360>: movt r1, #49416 ; 0xc108
0xc1306760 <+364>: mov r2, #0
0xc1306764 <+368>: str r0, [r11, #-48] ; 0xffffffd0
0xc1306768 <+372>: ldr r0, [r11, #-52] ; 0xffffffcc
0xc130676c <+376>: bl 0xc13064c8 <find_section>
0xc1306770 <+380>: cmp r4, #0
0xc1306774 <+384>: str r0, [r11, #-40] ; 0xffffffd8
0xc1306778 <+388>: bne 0xc130679c <vdso_init+424>
0xc130677c <+392>: movw r1, #3788 ; 0xecc
0xc1306780 <+396>: sub r0, r11, #52 ; 0x34
0xc1306784 <+400>: movt r1, #49416 ; 0xc108
0xc1306788 <+404>: bl 0xc1306560 <vdso_nullpatch_one>
0xc130678c <+408>: movw r1, #3808 ; 0xee0
0xc1306790 <+412>: sub r0, r11, #52 ; 0x34
0xc1306794 <+416>: movt r1, #49416 ; 0xc108
0xc1306798 <+420>: bl 0xc1306560 <vdso_nullpatch_one>
0xc130679c <+424>: mov r0, r6
0xc13067a0 <+428>: sub sp, r11, #32
0xc13067a4 <+432>: ldm sp, {r4, r5, r6, r7, r8, r11, sp, pc}
0xc13067a8 <+436>: msreq SPSR_sc, r0
End of assembler dump.
Offset 0xbc (188 decimal) is a store.
I haven't verified it myself but Phil Elwells first analysis here
sounds plausible
https://github.com/raspberrypi/linux/issues/2166#issuecomment-323355145
> I don't have a complete answer yet, but I can tell you that the problem
> is caused by writes to variables that are declared as __ro_after_init.
> The first failure is vdso_data_page being written from vdso_init, but
> if you remove the attribute then the fault moves to the next write
> that fits the pattern (vdso_text_mapping, cntvct_ok), eventually
> appearing in a different function altogether (ptmx_fops, written from
> tty_default_fops).
>
> What makes this interesting (and is the reason this doesn't fail
> normally) is that these functions are all called within kernel_init, so
> the __ro_after_init attribute shouldn't have caused the variable to be
> read-only yet.
Drop me a line if you want me to do some more testing.
so long & thanks for looking into it,
Hias
>
> -- Steve
>
>
> > [ 0.857996] LR is at 0x0
> > [ 0.860590] pc : [<c13066b0>] lr : [<00000000>] psr: a0000013
> > [ 0.860590] sp : ea285ea0 ip : ea757340 fp : ea285ed4
> > [ 0.872297] r10: 00000000 r9 : c130060c r8 : 00000135
> > [ 0.877630] r7 : c11a4108 r6 : 00000000 r5 : c11a3000 r4 : 00000001
> > [ 0.884290] r3 : c11a4108 r2 : eaa25be0 r1 : c16937c0 r0 : 0000165f
> > [ 0.890952] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> > [ 0.898231] Control: 10c5387d Table: 0020406a DAC: 00000051
> > [ 0.904094] Process swapper/0 (pid: 1, stack limit = 0xea284220)
> > [ 0.910224] Stack: (0xea285ea0 to 0xea286000)
> > [ 0.914681] 5ea0: 00000003 c1660000 c1660000 00000135 ea285f54 c1465bf4 c13065f4 c1660000
> > [ 0.923028] 5ec0: c1660000 00000135 ea285f54 ea285ed8 c0302390 c1306600 c03121e0 c03e4f48
> > [ 0.931376] 5ee0: ea285f0c ea285ef0 c1300600 c0638fd4 00000000 c11f11a0 c0e25c00 00000135
> > [ 0.939723] 5f00: ea285f54 ea285f10 c036c424 c1300618 00000003 00000003 00000000 c11ef97c
> > [ 0.948071] 5f20: c107da58 00000000 c1660000 c1465bf4 00000003 c1660000 c1660000 00000135
> > [ 0.956418] 5f40: c130060c c13d1828 ea285f94 ea285f58 c1300e74 c030234c 00000003 00000003
> > [ 0.964765] 5f60: 00000000 c130060c aaaaaaaf 00000000 c0d1d074 00000000 00000000 00000000
> > [ 0.973113] 5f80: 00000000 00000000 ea285fac ea285f98 c0d1d08c c1300d1c ea284000 00000000
> > [ 0.981460] 5fa0: 00000000 ea285fb0 c0308af8 c0d1d080 00000000 00000000 00000000 00000000
> > [ 0.989806] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [ 0.998153] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 aabaaaba eaaaaaaa
> > [ 1.006531] [<c13066b0>] (vdso_init) from [<c0302390>] (do_one_initcall+0x50/0x180)
> > [ 1.014365] [<c0302390>] (do_one_initcall) from [<c1300e74>] (kernel_init_freeable+0x164/0x204)
> > [ 1.023260] [<c1300e74>] (kernel_init_freeable) from [<c0d1d08c>] (kernel_init+0x18/0x124)
> > [ 1.031713] [<c0d1d08c>] (kernel_init) from [<c0308af8>] (ret_from_fork+0x14/0x3c)
> > [ 1.039444] Code: e1a00620 e1a07003 e3a0e000 e0822280 (e5832000)
> > [ 1.045706] ---[ end trace f660ee60085b3a9f ]---
> > [ 1.050670] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [ 1.050670]
> > [ 1.060016] CPU2: stopping
> > [ 1.062793] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 4.9.44 #1
> > [ 1.070070] Hardware name: BCM2835
> > [ 1.073576] [<c03133dc>] (unwind_backtrace) from [<c030d610>] (show_stack+0x20/0x24)
> > [ 1.081498] [<c030d610>] (show_stack) from [<c063220c>] (dump_stack+0xa0/0xb4)
> > [ 1.088889] [<c063220c>] (dump_stack) from [<c031118c>] (handle_IPI+0x2e0/0x300)
> > [ 1.096456] [<c031118c>] (handle_IPI) from [<c03015a0>] (bcm2836_arm_irqchip_handle_irq+0x84/0xb4)
> > [ 1.105613] [<c03015a0>] (bcm2836_arm_irqchip_handle_irq) from [<c030e2cc>] (__irq_svc+0x6c/0x90)
> > [ 1.114662] Exception stack(0xea2bdf48 to 0xea2bdf90)
> > [ 1.119823] df40: 00000001 00000000 00000000 c031f960 ea2bc000 c1503034
> > [ 1.128170] df60: c1503098 00000004 00000000 00000000 c15030a0 ea2bdfa4 c0309598 ea2bdf98
> > [ 1.136513] df80: c03095c8 c03095cc 60000013 ffffffff
> > [ 1.141690] [<c030e2cc>] (__irq_svc) from [<c03095cc>] (arch_cpu_idle+0x48/0x4c)
> > [ 1.149256] [<c03095cc>] (arch_cpu_idle) from [<c0d23060>] (default_idle_call+0x30/0x3c)
> > [ 1.157529] [<c0d23060>] (default_idle_call) from [<c038fc30>] (cpu_startup_entry+0x1ac/0x230)
> > [ 1.166334] [<c038fc30>] (cpu_startup_entry) from [<c0310c5c>] (secondary_start_kernel+0x128/0x134)
> > [ 1.175577] [<c0310c5c>] (secondary_start_kernel) from [<0030210c>] (0x30210c)
> > [ 1.182949] CPU3: stopping
> >
> > Full trace is here: https://gist.github.com/HiassofT/03b3685b53ab67a0b3e8eb9cf8f91ccd
> >
> > I've got similar crashes on the downstream RPi kernels 4.9 and 4.12
> > and Phil Elwell posted his analysis of the 4.9 problem in the github
> > issue https://github.com/raspberrypi/linux/issues/2166
> >
> > > The mark_readonly and mark_rodata_ro are not being called early (and
> > > adding rodata=n to the command line doesn't fix it), so it must be that
> > > the default protection state is being set to read-only.
> > >
> > > Digging deeper I've narrowed it down to the call to set_kernel_text_ro
> > > from the __ftrace_modify_code function in arch/arm/kernel/ftrace.c. It
> > > uses the same mechanism and section data as mark_rodata_ro. My theory
> > > is:
> > >
> > > 1. Data which is marked __ro_after_init is initially writeable.
> > > 2. The ro_perms data covers kernel text, read-only data and
> > > __ro_after_init data.
> > > 3. set_kernel_text_rw marks everything in ro_perms as writeable.
> > > 4. set_kernel_text_ro marks everything in ro_perms as read-only,
> > > including the __ro_after_init data.
> > > 5. Using the function tracing code involves code modification,
> > > resulting in calls to __ftrace_modify_code and set_kernel_text_ro.
> > > 6. Therefore if function tracing is enabled before kernel_init has
> > > completed then the __ro_after_init data is made read-only prematurely.
> > >
> > > This doesn't look like a Pi-specific bug/feature. A fix would require
> > > separating out the section permission data for the __ro_after_init,
> > > updating the mark_rodata_ro to use the new permissions data instead.
> >
> > so long,
> >
> > Hias
>
More information about the linux-rpi-kernel
mailing list