Random reboots on ODROID-N2+

Stefan Agner stefan at agner.ch
Tue Jun 22 00:39:23 PDT 2021


On 2021-05-17 11:14, Stefan Agner wrote:
> Hi,
> 
> We are currently testing a new release using Linux 5.10.33. I've
> received since several reports of random reboots every couple of days.
> Unfortunately the log (journald) doesn't show anything, just a hard cut
> at some point.
> 
> After running serial console on several instances, I was able to catch
> this stack trace:
> 
> [202983.988153] SError Interrupt on CPU3, code 0xbf000000 -- SError
> [202983.988155] CPU: 3 PID: 3463 Comm: mdns-repeater Not tainted 5.10.33
> #1
> [202983.988156] Hardware name: Hardkernel ODROID-N2Plus (DT)
> [202983.988157] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
> [202983.988158] pc : udp_send_skb.isra.0+0x178/0x390
> [202983.988159] lr : udp_send_skb.isra.0+0x130/0x390

<snip>

We do see those crashes in similar frequency with Linux 5.12:

[129988.642342] SError Interrupt on CPU4, code 0xbf000000 -- SError
[129988.642348] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.10 #1
[129988.642350] Hardware name: Hardkernel ODROID-N2Plus (DT)
[129988.642351] pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[129988.642352] pc : free_page_and_swap_cache+0x0/0x110
[129988.642352] lr : tlb_remove_table_rcu+0x30/0x60
[129988.642353] sp : ffff8000115bbdf0
[129988.642354] x29: ffff8000115bbdf0 x28: ffff800010103a18
[129988.642358] x27: 000000000000000a x26: ffff000000120000
[129988.642360] x25: ffff000000120000 x24: ffff8000115bbe90
[129988.642362] x23: ffff800011456680 x22: ffff0000e07df970
[129988.642365] x21: 0000000000000003 x20: 0000000000000001
[129988.642367] x19: ffff000005300000 x18: 0000000000000000
[129988.642369] x17: 0000000000000000 x16: 0000000000000000
[129988.642371] x15: 0000000000000000 x14: 0000000000000500
[129988.642373] x13: 0000000000000002 x12: 0000000000000000
[129988.642375] x11: ffff8000cf5e6000 x10: ffff000028212800
[129988.642377] x9 : 0000000000000001 x8 : 00000000fffff1b8
[129988.642379] x7 : 0000000000015f40 x6 : 0000000000000001
[129988.642381] x5 : ffff80001007cf4c x4 : 0000000000000007
[129988.642383] x3 : ffff0000e07e2e78 x2 : ffff000025a2bd00
[129988.642385] x1 : ffff800010208b60 x0 : fffffc00002e9a80
[129988.642387] Kernel panic - not syncing: Asynchronous SError
Interrupt
[129988.642388] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.12.10 #1
[129988.642389] Hardware name: Hardkernel ODROID-N2Plus (DT)
[129988.642390] Call trace:
[129988.642391]  dump_backtrace+0x0/0x1a0
[129988.642392]  show_stack+0x18/0x70
[129988.642392]  dump_stack+0xd0/0x12c
[129988.642393]  panic+0x170/0x338
[129988.642394]  nmi_panic+0x8c/0x90
[129988.642395]  arm64_serror_panic+0x78/0x84
[129988.642395]  do_serror+0x38/0xa0
[129988.642396]  el1_error+0x80/0xf8
[129988.642397]  free_page_and_swap_cache+0x0/0x110
[129988.642398]  rcu_core+0x310/0x5d0
[129988.642398]  rcu_core_si+0x10/0x20
[129988.642399]  _stext+0x128/0x28c
[129988.642400]  irq_exit+0xd8/0x100
[129988.642401]  __handle_domain_irq+0x68/0xc0
[129988.642401]  gic_handle_irq+0xa8/0xe0
[129988.642402]  el1_irq+0xbc/0x180
[129988.642403]  arch_cpu_idle+0x18/0x30
[129988.642404]  default_idle_call+0x20/0x68
[129988.642404]  do_idle+0x218/0x270
[129988.642405]  cpu_startup_entry+0x24/0x70
[129988.642406]  secondary_start_kernel+0x178/0x190
[129988.642418] SMP: stopping secondary CPUs
[129988.642419] Kernel Offset: disabled
[129988.642420] CPU features: 0x00240002,61082004
[129988.642421] Memory Limit: none

It seems load and/or hardware dependent since we see it on some devices
quite frequent (every few days), and on others it takes multiple weeks.
Of course the once we see it frequently are the ones in production :).

I am currently trying different stress-ng and other load to accelerate
the crash rate before then trying to git bisect it.

--
Stefan



More information about the linux-amlogic mailing list