Crash when booting RPi2 / bcm2836 with ftrace=function

Steven Rostedt rostedt at goodmis.org
Wed Aug 23 07:29:31 PDT 2017


On Mon, 21 Aug 2017 17:34:02 +0200
Matthias Reichl <hias at horus.com> wrote:

> When I enable the function tracer during boot on Raspberry Pi 2
> /bcm2836) I get a kernel crash very early in the boot process:

This is because I moved up the start time of function tracing to allow
to trace earlier in the boot process.

See commit f631718de3ca "ftrace: Move ftrace_init() to right after
memory initialization"

And commit e725c731e3b "tracing: Split tracing initialization into two
for early initialization" because people wanted to trace earlier in boot up.

I was nervous that some archs may take issue with this.

> 
> [    0.000000] Unable to handle kernel paging request at virtual address c11d00e8
> [    0.000000] pgd = c0204000
> [    0.000000] [c11d00e8] *pgd=0101940e(bad)
> [    0.000000] Internal error: Oops: 80d [#1] SMP ARM
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.8 #1
> [    0.000000] Hardware name: BCM2835
> [    0.000000] task: c1608100 task.stack: c1600000
> [    0.000000] PC is at smp_set_ops+0x2c/0x3c
> [    0.000000] LR is at 0xc11d00e8
> [    0.000000] pc : [<c14058c4>]    lr : [<c11d00e8>]    psr: a00000d3
> [    0.000000] sp : c1601f08  ip : c153af40  fp : c1601f14
> [    0.000000] r10: c1601f48  r9 : 00000200  r8 : 00000100
> [    0.000000] r7 : c1601f50  r6 : ea9f7660  r5 : 00000000  r4 : c16043bc
> [    0.000000] r3 : c14451c4  r2 : 00000000  r1 : 00000000  r0 : 00000000
> [    0.000000] Flags: NzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment none
> [    0.000000] Control: 10c5387d  Table: 0020406a  DAC: 00000051
> [    0.000000] Process swapper/0 (pid: 0, stack limit = 0xc1600220)
> [    0.000000] Stack: (0xc1601f08 to 0xc1602000)
> [    0.000000] 1f00:                   c1601f3c c1601f18 c1445374 c14058a4 c068d488 00000000
> [    0.000000] 1f20: ea9f7660 ea244300 c1601f48 00000000 c1601f84 c1601f40 c14b78e8 c1445220
> [    0.000000] 1f40: 00000010 00000000 ea2442c0 ea2442c0 c1601f50 c1601f50 c065d450 c14dca20
> [    0.000000] 1f60: c1780cc0 c1603c40 ffffffff c1780cc0 c14dca30 eafff940 c1601f94 c1601f88
> [    0.000000] 1f80: c1444fac c14b7780 c1601fac c1601f98 c1403994 c1444f98 00000000 00000000
> [    0.000000] 1fa0: c1601ff4 c1601fb0 c1400c54 c140396c ffffffff ffffffff 00000000 c1400750
> [    0.000000] 1fc0: 00000000 c14dca30 00000000 c1781294 c1603c58 c14dca2c c16094e4 0020406a
> [    0.000000] 1fe0: 410fc075 00000000 00000000 c1601ff8 0020807c c1400a18 00000000 00000000
> [    0.000000] [<c14058c4>] (smp_set_ops) from [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init+0x160/0x180)
> [    0.000000] [<c1445374>] (bcm2836_arm_irqchip_l1_intc_of_init) from [<c14b78e8>] (of_irq_init+0x174/0x2c4)
> [    0.000000] [<c14b78e8>] (of_irq_init) from [<c1444fac>] (irqchip_init+0x20/0x24)
> [    0.000000] [<c1444fac>] (irqchip_init) from [<c1403994>] (init_IRQ+0x34/0x98)
> [    0.000000] [<c1403994>] (init_IRQ) from [<c1400c54>] (start_kernel+0x248/0x3a8)
> [    0.000000] [<c1400c54>] (start_kernel) from [<0020807c>] (0x20807c)
> [    0.000000] Code: e1a0c000 e300e0e8 e8bc000f e34ce11d (e8ae000f) 
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
> [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task!
> 
> Full log is here: https://gist.github.com/HiassofT/45c4b03f4663a6d19e7a79cfbfd9c560
> 
> This was on 4.12.8, multi_v7_defconfig plus CONFIG_FUNCTION_TRACER
> selected in menuconfig and ftrace_filter=__request_module ftrace=function
> added to kernel command line.
> 
> The same crash also occurs on 4.13-rc5 and with only
> ftrace=function on the kernel command line.
> 
> Enabling the function tracer at runtime via
> echo function > /sys/kernel/tracing/current_tracer
> works as expected.
> 
> With kernel 4.9.44 the crash is a little bit different:

Interesting, because my updates happened in 4.12.

> 
> [    0.816852] Unable to handle kernel paging request at virtual address c11a4108
> [    0.824490] pgd = c0204000
> [    0.827382] [c11a4108] *pgd=0101940e(bad)
> [    0.831714] Internal error: Oops: 80d [#1] SMP ARM
> [    0.836605] Modules linked in:
> [    0.839743] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.9.44 #1
> [    0.845781] Hardware name: BCM2835
> [    0.849257] task: ea288000 task.stack: ea284000
> [    0.853899] PC is at vdso_init+0xbc/0x1b8

Interesting. vdso should not be traced, but vdso_init can be. What was
the call that it was making? That is, have an objdump of these
locations?

-- Steve


> [    0.857996] LR is at 0x0
> [    0.860590] pc : [<c13066b0>]    lr : [<00000000>]    psr: a0000013
> [    0.860590] sp : ea285ea0  ip : ea757340  fp : ea285ed4
> [    0.872297] r10: 00000000  r9 : c130060c  r8 : 00000135
> [    0.877630] r7 : c11a4108  r6 : 00000000  r5 : c11a3000  r4 : 00000001
> [    0.884290] r3 : c11a4108  r2 : eaa25be0  r1 : c16937c0  r0 : 0000165f
> [    0.890952] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [    0.898231] Control: 10c5387d  Table: 0020406a  DAC: 00000051
> [    0.904094] Process swapper/0 (pid: 1, stack limit = 0xea284220)
> [    0.910224] Stack: (0xea285ea0 to 0xea286000)
> [    0.914681] 5ea0: 00000003 c1660000 c1660000 00000135 ea285f54 c1465bf4 c13065f4 c1660000
> [    0.923028] 5ec0: c1660000 00000135 ea285f54 ea285ed8 c0302390 c1306600 c03121e0 c03e4f48
> [    0.931376] 5ee0: ea285f0c ea285ef0 c1300600 c0638fd4 00000000 c11f11a0 c0e25c00 00000135
> [    0.939723] 5f00: ea285f54 ea285f10 c036c424 c1300618 00000003 00000003 00000000 c11ef97c
> [    0.948071] 5f20: c107da58 00000000 c1660000 c1465bf4 00000003 c1660000 c1660000 00000135
> [    0.956418] 5f40: c130060c c13d1828 ea285f94 ea285f58 c1300e74 c030234c 00000003 00000003
> [    0.964765] 5f60: 00000000 c130060c aaaaaaaf 00000000 c0d1d074 00000000 00000000 00000000
> [    0.973113] 5f80: 00000000 00000000 ea285fac ea285f98 c0d1d08c c1300d1c ea284000 00000000
> [    0.981460] 5fa0: 00000000 ea285fb0 c0308af8 c0d1d080 00000000 00000000 00000000 00000000
> [    0.989806] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [    0.998153] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000 aabaaaba eaaaaaaa
> [    1.006531] [<c13066b0>] (vdso_init) from [<c0302390>] (do_one_initcall+0x50/0x180)
> [    1.014365] [<c0302390>] (do_one_initcall) from [<c1300e74>] (kernel_init_freeable+0x164/0x204)
> [    1.023260] [<c1300e74>] (kernel_init_freeable) from [<c0d1d08c>] (kernel_init+0x18/0x124)
> [    1.031713] [<c0d1d08c>] (kernel_init) from [<c0308af8>] (ret_from_fork+0x14/0x3c)
> [    1.039444] Code: e1a00620 e1a07003 e3a0e000 e0822280 (e5832000) 
> [    1.045706] ---[ end trace f660ee60085b3a9f ]---
> [    1.050670] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [    1.050670] 
> [    1.060016] CPU2: stopping
> [    1.062793] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D         4.9.44 #1
> [    1.070070] Hardware name: BCM2835
> [    1.073576] [<c03133dc>] (unwind_backtrace) from [<c030d610>] (show_stack+0x20/0x24)
> [    1.081498] [<c030d610>] (show_stack) from [<c063220c>] (dump_stack+0xa0/0xb4)
> [    1.088889] [<c063220c>] (dump_stack) from [<c031118c>] (handle_IPI+0x2e0/0x300)
> [    1.096456] [<c031118c>] (handle_IPI) from [<c03015a0>] (bcm2836_arm_irqchip_handle_irq+0x84/0xb4)
> [    1.105613] [<c03015a0>] (bcm2836_arm_irqchip_handle_irq) from [<c030e2cc>] (__irq_svc+0x6c/0x90)
> [    1.114662] Exception stack(0xea2bdf48 to 0xea2bdf90)
> [    1.119823] df40:                   00000001 00000000 00000000 c031f960 ea2bc000 c1503034
> [    1.128170] df60: c1503098 00000004 00000000 00000000 c15030a0 ea2bdfa4 c0309598 ea2bdf98
> [    1.136513] df80: c03095c8 c03095cc 60000013 ffffffff
> [    1.141690] [<c030e2cc>] (__irq_svc) from [<c03095cc>] (arch_cpu_idle+0x48/0x4c)
> [    1.149256] [<c03095cc>] (arch_cpu_idle) from [<c0d23060>] (default_idle_call+0x30/0x3c)
> [    1.157529] [<c0d23060>] (default_idle_call) from [<c038fc30>] (cpu_startup_entry+0x1ac/0x230)
> [    1.166334] [<c038fc30>] (cpu_startup_entry) from [<c0310c5c>] (secondary_start_kernel+0x128/0x134)
> [    1.175577] [<c0310c5c>] (secondary_start_kernel) from [<0030210c>] (0x30210c)
> [ 1.182949] CPU3: stopping
> 
> Full trace is here: https://gist.github.com/HiassofT/03b3685b53ab67a0b3e8eb9cf8f91ccd
> 
> I've got similar crashes on the downstream RPi kernels 4.9 and 4.12
> and Phil Elwell posted his analysis of the 4.9 problem in the github
> issue https://github.com/raspberrypi/linux/issues/2166
> 
> > The mark_readonly and mark_rodata_ro are not being called early (and
> > adding rodata=n to the command line doesn't fix it), so it must be that
> > the default protection state is being set to read-only.
> > 
> > Digging deeper I've narrowed it down to the call to set_kernel_text_ro
> > from the __ftrace_modify_code function in arch/arm/kernel/ftrace.c. It
> > uses the same mechanism and section data as mark_rodata_ro. My theory
> > is:
> > 
> > 1. Data which is marked __ro_after_init is initially writeable.
> > 2. The ro_perms data covers kernel text, read-only data and
> >    __ro_after_init data.
> > 3. set_kernel_text_rw marks everything in ro_perms as writeable.
> > 4. set_kernel_text_ro marks everything in ro_perms as read-only,
> >    including the __ro_after_init data.
> > 5. Using the function tracing code involves code modification,
> >    resulting in calls to __ftrace_modify_code and set_kernel_text_ro.
> > 6. Therefore if function tracing is enabled before kernel_init has
> >    completed then the __ro_after_init data is made read-only prematurely.
> > 
> > This doesn't look like a Pi-specific bug/feature. A fix would require
> > separating out the section permission data for the __ro_after_init,
> > updating the mark_rodata_ro to use the new permissions data instead.  
> 
> so long,
> 
> Hias




More information about the linux-rpi-kernel mailing list