Crash when booting RPi2 / bcm2836 with ftrace=function
Steven Rostedt
rostedt at goodmis.org
Wed Aug 23 07:29:31 PDT 2017
On Mon, 21 Aug 2017 17:34:02 +0200
Matthias Reichl <hias at horus.com> wrote:
> When I enable the function tracer during boot on Raspberry Pi 2
> /bcm2836) I get a kernel crash very early in the boot process:
This is because I moved up the start time of function tracing to allow
to trace earlier in the boot process.
See commit f631718de3ca "ftrace: Move ftrace_init() to right after
memory initialization"
And commit e725c731e3b "tracing: Split tracing initialization into two
for early initialization" because people wanted to trace earlier in boot up.
I was nervous that some archs may take issue with this.
>
> [ 0.000000] Unable to handle kernel paging request at virtual address c11d00e8
> [ 0.000000] pgd = c0204000
> [ 0.000000] [c11d00e8] *pgd=0101940e(bad)
> [ 0.000000] Internal error: Oops: 80d [#1] SMP ARM
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.8 #1
> [ 0.000000] Hardware name: BCM2835
> [ 0.000000] task: c1608100 task.stack: c1600000
> [ 0.000000] PC is at smp_set_ops+0x2c/0x3c
> [ 0.000000] LR is at 0xc11d00e8
> [ 0.000000] pc : [<c14058c4>] lr : [<c11d00e8>] psr: a00000d3
> [ 0.000000] sp : c1601f08 ip : c153af40 fp : c1601f14
> [ 0.000000] r10: c1601f48 r9 : 00000200 r8 : 00000100
> [ 0.000000] r7 : c1601f50 r6 : ea9f7660 r5 : 00000000 r4 : c16043bc
> [ 0.000000] r3 : c14451c4 r2 : 00000000 r1 : 00000000 r0 : 00000000
> [ 0.000000] Flags: NzCv IRQs off FIQs off Mode SVC_32 ISA ARM Segment none
> [ 0.000000] Control: 10c5387d Table: 0020406a DAC: 00000051
> [ 0.000000] Process swapper/0 (pid: 0, stack limit = 0xc1600220)
> [ 0.000000] Stack: (0xc1601f08 to 0xc1602000)
> [ 0.000000] 1f00: c1601f3c c1601f18 c1445374 c14058a4 c068d488 00000000
> [ 0.000000] 1f20: ea9f7660 ea244300 c1601f48 00000000 c1601f84 c1601f40 c14b78e8 c1445220
> [ 0.000000] 1f40: 00000010 00000000 ea2442c0 ea2442c0 c1601f50 c1601f50 c065d450 c14dca20
> [ 0.000000] 1f60: c1780cc0 c1603c40 ffffffff c1780cc0 c14dca30 eafff940 c1601f94 c1601f88
> [ 0.000000] 1f80: c1444fac c14b7780 c1601fac c1601f98 c1403994 c1444f98 00000000 00000000
> [ 0.000000] 1fa0: c1601ff4 c1601fb0 c1400c54 c140396c ffffffff ffffffff 00000000 c1400750
> [ 0.000000] 1fc0: 00000000 c14dca30 00000000 c1781294 c1603c58 c14dca2c c16094e4 0020406a
> [ 0.000000] 1fe0: 410fc075 00000000 00000000 c1601ff8 0020807c c1400a18 00000000 00000000
> [ 0.000000] [<c14058c4>] (smp_set_ops) from [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init+0x160/0x180)
> [ 0.000000] [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init) from [<c14b78e8>] (of_irq_init+0x174/0x2c4)
> [ 0.000000] [<c14b78e8>] (of_irq_init) from [<c1444fac>] (irqchip_init+0x20/0x24)
> [ 0.000000] [<c1444fac>] (irqchip_init) from [<c1403994>] (init_IRQ+0x34/0x98)
> [ 0.000000] [<c1403994>] (init_IRQ) from [<c1400c54>] (start_kernel+0x248/0x3a8)
> [ 0.000000] [<c1400c54>] (start_kernel) from [<0020807c>] (0x20807c)
> [ 0.000000] Code: e1a0c000 e300e0e8 e8bc000f e34ce11d (e8ae000f)
> [ 0.000000] ---[ end trace 0000000000000000 ]---
> [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
>
> Full log is here: https://gist.github.com/HiassofT/45c4b03f4663a6d19e7a79cfbfd9c560
>
> This was on 4.12.8, multi_v7_defconfig plus CONFIG_FUNCTION_TRACER
> selected in menuconfig and ftrace_filter=__request_module ftrace=function
> added to kernel command line.
>
> The same crash also occurs on 4.13-rc5 and with only
> ftrace=function on the kernel command line.
>
> Enabling the function tracer at runtime via
> echo function > /sys/kernel/tracing/current_tracer
> works as expected.
>
> With kernel 4.9.44 the crash is a little bit different:
Interesting, because my updates happened in 4.12.
>
> [ 0.816852] Unable to handle kernel paging request at virtual address c11a4108
> [ 0.824490] pgd = c0204000
> [ 0.827382] [c11a4108] *pgd=0101940e(bad)
> [ 0.831714] Internal error: Oops: 80d [#1] SMP ARM
> [ 0.836605] Modules linked in:
> [ 0.839743] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.44 #1
> [ 0.845781] Hardware name: BCM2835
> [ 0.849257] task: ea288000 task.stack: ea284000
> [ 0.853899] PC is at vdso_init+0xbc/0x1b8
Interesting. vdso should not be traced, but vdso_init can be. What was
the call that it was making? That is, have an objdump of these
locations?
-- Steve
> [ 0.857996] LR is at 0x0
> [ 0.860590] pc : [<c13066b0>] lr : [<00000000>] psr: a0000013
> [ 0.860590] sp : ea285ea0 ip : ea757340 fp : ea285ed4
> [ 0.872297] r10: 00000000 r9 : c130060c r8 : 00000135
> [ 0.877630] r7 : c11a4108 r6 : 00000000 r5 : c11a3000 r4 : 00000001
> [ 0.884290] r3 : c11a4108 r2 : eaa25be0 r1 : c16937c0 r0 : 0000165f
> [ 0.890952] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
> [ 0.898231] Control: 10c5387d Table: 0020406a DAC: 00000051
> [ 0.904094] Process swapper/0 (pid: 1, stack limit = 0xea284220)
> [ 0.910224] Stack: (0xea285ea0 to 0xea286000)
> [ 0.914681] 5ea0: 00000003 c1660000 c1660000 00000135 ea285f54 c1465bf4 c13065f4 c1660000
> [ 0.923028] 5ec0: c1660000 00000135 ea285f54 ea285ed8 c0302390 c1306600 c03121e0 c03e4f48
> [ 0.931376] 5ee0: ea285f0c ea285ef0 c1300600 c0638fd4 00000000 c11f11a0 c0e25c00 00000135
> [ 0.939723] 5f00: ea285f54 ea285f10 c036c424 c1300618 00000003 00000003 00000000 c11ef97c
> [ 0.948071] 5f20: c107da58 00000000 c1660000 c1465bf4 00000003 c1660000 c1660000 00000135
> [ 0.956418] 5f40: c130060c c13d1828 ea285f94 ea285f58 c1300e74 c030234c 00000003 00000003
> [ 0.964765] 5f60: 00000000 c130060c aaaaaaaf 00000000 c0d1d074 00000000 00000000 00000000
> [ 0.973113] 5f80: 00000000 00000000 ea285fac ea285f98 c0d1d08c c1300d1c ea284000 00000000
> [ 0.981460] 5fa0: 00000000 ea285fb0 c0308af8 c0d1d080 00000000 00000000 00000000 00000000
> [ 0.989806] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [ 0.998153] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 aabaaaba eaaaaaaa
> [ 1.006531] [<c13066b0>] (vdso_init) from [<c0302390>] (do_one_initcall+0x50/0x180)
> [ 1.014365] [<c0302390>] (do_one_initcall) from [<c1300e74>] (kernel_init_freeable+0x164/0x204)
> [ 1.023260] [<c1300e74>] (kernel_init_freeable) from [<c0d1d08c>] (kernel_init+0x18/0x124)
> [ 1.031713] [<c0d1d08c>] (kernel_init) from [<c0308af8>] (ret_from_fork+0x14/0x3c)
> [ 1.039444] Code: e1a00620 e1a07003 e3a0e000 e0822280 (e5832000)
> [ 1.045706] ---[ end trace f660ee60085b3a9f ]---
> [ 1.050670] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [ 1.050670]
> [ 1.060016] CPU2: stopping
> [ 1.062793] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 4.9.44 #1
> [ 1.070070] Hardware name: BCM2835
> [ 1.073576] [<c03133dc>] (unwind_backtrace) from [<c030d610>] (show_stack+0x20/0x24)
> [ 1.081498] [<c030d610>] (show_stack) from [<c063220c>] (dump_stack+0xa0/0xb4)
> [ 1.088889] [<c063220c>] (dump_stack) from [<c031118c>] (handle_IPI+0x2e0/0x300)
> [ 1.096456] [<c031118c>] (handle_IPI) from [<c03015a0>] (bcm2836_arm_irqchip_handle_irq+0x84/0xb4)
> [ 1.105613] [<c03015a0>] (bcm2836_arm_irqchip_handle_irq) from [<c030e2cc>] (__irq_svc+0x6c/0x90)
> [ 1.114662] Exception stack(0xea2bdf48 to 0xea2bdf90)
> [ 1.119823] df40: 00000001 00000000 00000000 c031f960 ea2bc000 c1503034
> [ 1.128170] df60: c1503098 00000004 00000000 00000000 c15030a0 ea2bdfa4 c0309598 ea2bdf98
> [ 1.136513] df80: c03095c8 c03095cc 60000013 ffffffff
> [ 1.141690] [<c030e2cc>] (__irq_svc) from [<c03095cc>] (arch_cpu_idle+0x48/0x4c)
> [ 1.149256] [<c03095cc>] (arch_cpu_idle) from [<c0d23060>] (default_idle_call+0x30/0x3c)
> [ 1.157529] [<c0d23060>] (default_idle_call) from [<c038fc30>] (cpu_startup_entry+0x1ac/0x230)
> [ 1.166334] [<c038fc30>] (cpu_startup_entry) from [<c0310c5c>] (secondary_start_kernel+0x128/0x134)
> [ 1.175577] [<c0310c5c>] (secondary_start_kernel) from [<0030210c>] (0x30210c)
> [ 1.182949] CPU3: stopping
>
> Full trace is here: https://gist.github.com/HiassofT/03b3685b53ab67a0b3e8eb9cf8f91ccd
>
> I've got similar crashes on the downstream RPi kernels 4.9 and 4.12
> and Phil Elwell posted his analysis of the 4.9 problem in the github
> issue https://github.com/raspberrypi/linux/issues/2166
>
> > The mark_readonly and mark_rodata_ro are not being called early (and
> > adding rodata=n to the command line doesn't fix it), so it must be that
> > the default protection state is being set to read-only.
> >
> > Digging deeper I've narrowed it down to the call to set_kernel_text_ro
> > from the __ftrace_modify_code function in arch/arm/kernel/ftrace.c. It
> > uses the same mechanism and section data as mark_rodata_ro. My theory
> > is:
> >
> > 1. Data which is marked __ro_after_init is initially writeable.
> > 2. The ro_perms data covers kernel text, read-only data and
> > __ro_after_init data.
> > 3. set_kernel_text_rw marks everything in ro_perms as writeable.
> > 4. set_kernel_text_ro marks everything in ro_perms as read-only,
> > including the __ro_after_init data.
> > 5. Using the function tracing code involves code modification,
> > resulting in calls to __ftrace_modify_code and set_kernel_text_ro.
> > 6. Therefore if function tracing is enabled before kernel_init has
> > completed then the __ro_after_init data is made read-only prematurely.
> >
> > This doesn't look like a Pi-specific bug/feature. A fix would require
> > separating out the section permission data for the __ro_after_init,
> > updating the mark_rodata_ro to use the new permissions data instead.
>
> so long,
>
> Hias
More information about the linux-rpi-kernel
mailing list