do page fault in atomic bug on arm

Masami Hiramatsu mhiramat at kernel.org
Sun Nov 26 17:34:13 PST 2017


On Sun, 26 Nov 2017 20:07:43 +0800
Alex Shi <alex.shi at linaro.org> wrote:

> CC Masami Hiramatsu for ftracetest part.
> 
> Hi Russell King,
> 
> Thanks a lot for quick response!
> 
> Regards
> Alex
> 
> On 11/25/2017 03:27 AM, Russell King - ARM Linux wrote:
> > On Fri, Nov 24, 2017 at 11:09:30PM +0800, Alex Shi wrote:
> >> Full agree with your analysis. Is it possible to stain PC value with
> >> heavy stress on thermal or sth else? the ARM64 board run well with
> >> ftracetest of LTP.
> > 
> > In your first email, you said "x15 platform, which is a armv7 board."
> > Here you say "ARM64 board" which isn't armv7.  There's x15 DTS under
> > arch/arm/boot/dts, so I guess you mean 32-bit ARM, but who knows...
> > 
> > Anyway, I've tried running ftracetest on an OMAP4430 SDP board, and
> > after a while with the patch I sent you, I get:
> > 
> > Internal error: Oops - BUG: 0 [#1] SMP ARM
> > Modules linked in:
> > CPU: 1 PID: 2948 Comm: ftracetest Not tainted 4.14.0+ #557
> > Hardware name: Generic OMAP4 (Flattened Device Tree)
> > task: ce41c100 task.stack: cc7b8000
> > PC is at oops+0x0/0x4
> > LR is at trace_hardirqs_on_caller+0x154/0x1e0
> > pc : [<c0015adc>]    lr : [<c0086840>]    psr: 20000193
> > sp : cc7b9fb0  ip : cc7b9f80  fp : 00000000
> > r10: 00000000  r9 : cc7b8000  r8 : c0015c28
> > r7 : 00000006  r6 : 00000004  r5 : 0009fed4  r4 : 00000001
> > r3 : 00000000  r2 : cc7b9fb0  r1 : 60000193  r0 : 00000001
> > Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > Control: 10c5387d  Table: 8e7b804a  DAC: 00000055
> > Process ftracetest (pid: 2948, stack limit = 0xcc7b8210)
> > Stack: (0xcc7b9fb0 to 0xcc7ba000)
> > 9fa0:                                     00000000 00000000 0009b008 0000000d
> > 9fc0: 00000001 0009fed4 00000004 00000006 0009b3e0 000a17d4 00000000 beaf6e3c
> > 9fe0: 0009fed0 beaf6e20 000319e0 b6e7199c 60000193 00000001 6b6b6b6b a56b6b6b
> > Backtrace: no frame pointer
> > Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2)
> > ---[ end trace 390efe5843605357 ]---
> > 
> > The other CPU also oopses:
> > 
> > Internal error: Oops - BUG: 0 [#3] SMP ARM
> > Modules linked in:
> > CPU: 1 PID: 1 Comm: init Tainted: G      D         4.14.0+ #557
> > Hardware name: Generic OMAP4 (Flattened Device Tree)
> > task: ced04c00 task.stack: ced06000
> > PC is at oops+0x0/0x4
> > LR is at trace_hardirqs_on+0x14/0x18
> > pc : [<c0015adc>]    lr : [<c00868e0>]    psr: 20000193
> > sp : ced07fb0  ip : ced07fa0  fp : 00000000
> > r10: 00000000  r9 : ced06000  r8 : c0015c28
> > r7 : 0000004e  r6 : bec0acd4  r5 : 000176b4  r4 : bec0ac3c
> > r3 : 00000000  r2 : ced07fb0  r1 : 60000193  r0 : c0015aa8
> > Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
> > Control: 10c5387d  Table: 8e2d804a  DAC: 00000055
> > Process init (pid: 1, stack limit = 0xced06210)
> > Stack: (0xced07fb0 to 0xced08000)
> > 7fa0:                                     00000000 00000000 00000000 00000000
> > 7fc0: bec0ac3c 000176b4 bec0acd4 0000004e 10000000 00000000 0000a1d0 bec0ac44
> > 7fe0: bec0acd8 bec0ac28 b6e54544 b6e5456c 60000193 bec0ac28 005d5555 00020201
> > Backtrace: no frame pointer
> > Code: e9527fff e1a00000 e28dd048 e1b0f00e (e7f001f2)
> > ---[ end trace 390efe5843605358 ]---
> > 
> > which is exactly your bug, but caught a bit earlier.
> > 
> > This happens while executing this ftrace test:
> > 
> > [28] Register/unregister many kprobe events
> > 
> > and needs a kernel with ftrace and kprobes enabled.
> > 
> > Unfortunately, the debug is immediately after a call to
> > trace_hardirqs_on() in no_work_pending, so the LR value is
> > meaningless.
> > 
> > So, now that we know it's tracing kprobes triggering it - it's
> > trying to set tracepoints on the first 256 symbols in the kernel's
> > kallsyms, which includes all sorts of things.
> > 
> > With some extra debug, this doesn't look clever:
> > 
> > trace_kprobe: Inserting kprobe at ret_fast_syscall+0
> > trace_kprobe: Inserting kprobe at slow_work_pending+0
> > trace_kprobe: Inserting kprobe at ret_slow_syscall+0
> > trace_kprobe: Could not insert probe at ret_slow_syscall+0: -22
> > trace_kprobe: Inserting kprobe at ret_to_user+0
> > trace_kprobe: Could not insert probe at ret_to_user+0: -22
> > trace_kprobe: Inserting kprobe at ret_to_user_from_irq+0
> > trace_kprobe: Inserting kprobe at no_work_pending+0
> > trace_kprobe: Inserting kprobe at oops+0
> > trace_kprobe: Could not insert probe at oops+0: -22
> > trace_kprobe: Inserting kprobe at ret_from_fork+0
> > trace_kprobe: Inserting kprobe at vector_swi+0
> > trace_kprobe: Inserting kprobe at local_restart+0
> > trace_kprobe: Inserting kprobe at __sys_trace+0
> > trace_kprobe: Inserting kprobe at __sys_trace_return+0
> > trace_kprobe: Inserting kprobe at __sys_trace_return_nosave+0
> > trace_kprobe: Could not insert probe at __sys_trace_return_nosave+0: -22
> > trace_kprobe: Inserting kprobe at __cr_alignment+0
> > trace_kprobe: Could not insert probe at __cr_alignment+0: -22
> > trace_kprobe: Inserting kprobe at sys_call_table+0
> > trace_kprobe: Inserting kprobe at sys_syscall+0
> > trace_kprobe: Inserting kprobe at sys_sigreturn_wrapper+0
> > trace_kprobe: Inserting kprobe at sys_rt_sigreturn_wrapper+0
> > trace_kprobe: Inserting kprobe at sys_statfs64_wrapper+0
> > trace_kprobe: Inserting kprobe at sys_fstatfs64_wrapper+0
> > 
> > I wouldn't be surprised if some of those were the cause of it -
> > for example, what guarantee do we have that a trace kprobe inserted
> > at ret_fast_syscall which starts with this:
> > 
> > c0015a40:       e5ad0008        str     r0, [sp, #8]!
> > 
> > will be handled correctly?  I can't say, I've virtually no knowledge
> > about kprobes, but I guess it isn't - especially as there's this
> > comment in the ARM kprobes code:
> > 
> >          * Never instrument insn like 'str r0, [sp, +/-r1]'. Also, insn likes
> >          * 'str r0, [sp, #-68]' should also be prohibited.
> > 
> > Clearly, that's not the case as the kprobes insert on
> > ret_fast_syscall succeeded.
> > 
> > Adding Tixy, as he's more knowledgable in this area - I suggest
> > someone knowledgable in this area runs
> > 
> > 	ftracetest test.d/kprobe/multiple_kprobes.tc
> > 
> > and fixes these bugs... also running the entire ftracetest suite
> > would probably also be a very good idea.

This test case is to test whether we can safely put multiple kprobes
on different functions or not. If it causes kernel crashes, it means
kprobes has a bug on that function (the last "Inserting kprobe at ...").

Since the ftrace kprobe event interface is opened to userspace (but
also requires root priv), it must be fixed.

BTW, the test is not care about failure or blacklisted functions.
I'll update it to check blacklist functions if available.

Thank you,

-- 
Masami Hiramatsu <mhiramat at kernel.org>



More information about the linux-arm-kernel mailing list