Random reboots on ODROID-N2+

Stefan Agner stefan at agner.ch
Tue May 18 03:15:51 PDT 2021


On 2021-05-18 03:33, Andrew Lunn wrote:
> On Mon, May 17, 2021 at 11:14:18AM +0200, Stefan Agner wrote:
>> Hi,
>>
>> We are currently testing a new release using Linux 5.10.33. I've
>> received since several reports of random reboots every couple of days.
>> Unfortunately the log (journald) doesn't show anything, just a hard cut
>> at some point.
>>
>> After running serial console on several instances, I was able to catch
>> this stack trace:
>>
>> [202983.988153] SError Interrupt on CPU3, code 0xbf000000 -- SError
>> [202983.988155] CPU: 3 PID: 3463 Comm: mdns-repeater Not tainted 5.10.33
>> #1
>> [202983.988156] Hardware name: Hardkernel ODROID-N2Plus (DT)
>> [202983.988157] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
>> [202983.988158] pc : udp_send_skb.isra.0+0x178/0x390
>> [202983.988159] lr : udp_send_skb.isra.0+0x130/0x390
> 
> Hi Stefan

Hi Andrew,

> 
> Could you generate net/ipv4/udp.lst so we can see what
> udp_send_skb.isra.0+0x178/0x390 is trying to do, and what bit of C
> code it maps to.

Ok, built net/ipv4/udp.lst using the same build environment (buildroot)
the kernel which generated the stack trace has been built with, so I
think this should add up:

ffff800010c1bb60 <udp_send_skb.isra.0>:
static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
...
                udp4_hwcsum(skb, fl4->saddr, fl4->daddr);
ffff800010c1bc78:       29450ae1        ldp     w1, w2, [x23, #40]
ffff800010c1bc7c:       aa1303e0        mov     x0, x19
ffff800010c1bc80:       94000000        bl      ffff800010c184b0
<udp4_hwcsum>
                        ffff800010c1bc80: R_AARCH64_CALL26     
udp4_hwcsum
        err = ip_send_skb(sock_net(sk), skb);
ffff800010c1bc84:       f9401ac0        ldr     x0, [x22, #48]
ffff800010c1bc88:       aa1303e1        mov     x1, x19
ffff800010c1bc8c:       94000000        bl      0 <ip_send_skb>
                        ffff800010c1bc8c: R_AARCH64_CALL26     
ip_send_skb
        if (err) {
ffff800010c1bc90:       350008e0        cbnz    w0, ffff800010c1bdac
<udp_send_skb.isra.0+0x24c>
...
        u64 pc = READ_ONCE(ti->preempt_count);
ffff800010c1bcd4:       f9400820        ldr     x0, [x1, #16]
        WRITE_ONCE(ti->preempt.count, --pc);
ffff800010c1bcd8:       d1000400        sub     x0, x0, #0x1
ffff800010c1bcdc:       b9001020        str     w0, [x1, #16]
        return !pc || !READ_ONCE(ti->preempt_count);
...

The full udp.lst file:
https://drive.google.com/file/d/1j0RKOfuMXmCRWILpkG3uk_beohWrr-ho/view?usp=sharing

Since I only have this one trace, I am not 100% if this trace is just a
random one or always the case.

But things seem to add up to me: mdns-repeater deals with UDP packets,
and the it seems that the code tries to make use of HW check-summing
(from lr)? This would explain why this platform only shows the problem.

--
Stefan



More information about the linux-amlogic mailing list