RISC-V regression on Linux 6.7-rc1

Charlie Jenkins charlie at rivosinc.com
Mon Nov 20 09:39:00 PST 2023


On Mon, Nov 20, 2023 at 07:54:29AM -0800, Ron Economos wrote:
> Linux 6.7-rc1 fails to boot on the HiFive Unmatched running Ubuntu 22.04.
> During boot, there are many random oops and kernel panics. Reverting the
> patch series "riscv: Add remaining module relocations and tests" (commit
> b51fc88cb35e49) resolves the issue.
> 
> Here's an example, but it does something different on each boot.
> 
> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
> virtual address ffffff97e6cb45fa
> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
> virtual address 0000005b7d944e0e
> Nov 17 21:33:38 riscv64 kernel: Oops [#1]
> Nov 17 21:33:38 riscv64 kernel: Modules linked in: sch_fq_codel auth_rpcgss
> nfs_acl drm(+) lockd grace backlight sunrpc efi_pstore ip_tables x_tables
> autofs4 btrfs blake2b_generic raid10 raid456 >
> Nov 17 21:33:38 riscv64 kernel: CPU: 1 PID: 391 Comm: cloud-init Not tainted
> 6.7.0-rc1 #2
> Nov 17 21:33:38 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00
> (DT)
> Nov 17 21:33:38 riscv64 kernel: epc : refill_obj_stock+0x4e/0x160
> Nov 17 21:33:38 riscv64 kernel:  ra : refill_obj_stock+0x4e/0x160
> Nov 17 21:33:38 riscv64 kernel: epc : ffffffff802d9264 ra : ffffffff802d9264
> sp : ffffffd898e2fd10
> Nov 17 21:33:38 riscv64 kernel:  gp : ffffffff81c4bdd8 tp : ffffffd884f9ec00
> t0 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  t1 : 0000000000000000 t2 : 0000000000000000
> s0 : ffffffd898e2fd60
> Nov 17 21:33:38 riscv64 kernel:  s1 : ffffffdbfed02f70 a0 : ffffffd89a2b1ec0
> a1 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  a2 : 0000000000000000 a3 : 0000000000000000
> a4 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  a5 : 0000000000000000 a6 : 0000000000000000
> a7 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  s2 : 31413797e6cb45fa s3 : 0000000000000108
> s4 : 0000000200000022
> Nov 17 21:33:38 riscv64 kernel:  s5 : ffffffff81cbb1e8 s6 : ffffffd884f9ec00
> s7 : 0000003fc0325028
> Nov 17 21:33:38 riscv64 kernel:  s8 : 0000000000000000 s9 : 0000002b10fdc170
> s10: 0000002b10fe2290
> Nov 17 21:33:38 riscv64 kernel:  s11: 0000000000000000 t3 : 0000000000000000
> t4 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  t5 : 0000000000000000 t6 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel: status: 0000000200000100 badaddr:
> ffffff97e6cb45fa cause: 000000000000000d
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802d9264>]
> refill_obj_stock+0x4e/0x160
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802dce26>]
> obj_cgroup_uncharge+0x1c/0x2a
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802b97a6>]
> kmem_cache_free+0x1b2/0x548
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802f358c>] __fput+0x132/0x252
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802f3702>] ____fput+0x18/0x22
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80043e96>] task_work_run+0xa8/0xee
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800bfc66>]
> exit_to_user_mode_loop.isra.0+0xf2/0x10e
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4f2ac>]
> syscall_exit_to_user_mode+0x54/0x64
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4eee0>]
> do_trap_ecall_u+0x5a/0x13a
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c5a716>]
> ret_from_exception+0x0/0x66
> Nov 17 21:33:38 riscv64 kernel: Code: 639c 94be 689c 8963 0aa7 8526 f097
> ffff 80e7 26a0 (3783) 0009
> Nov 17 21:33:38 riscv64 kernel: ---[ end trace 0000000000000000 ]---
> Nov 17 21:33:38 riscv64 kernel: Oops [#2]
> Nov 17 21:33:38 riscv64 kernel: note: cloud-init[391] exited with irqs
> disabled
> Nov 17 21:33:38 riscv64 kernel: Modules linked in: sch_fq_codel auth_rpcgss
> nfs_acl drm(+) lockd grace backlight sunrpc efi_pstore ip_tables x_tables
> autofs4 btrfs blake2b_generic raid10 raid456 >
> Nov 17 21:33:38 riscv64 kernel: CPU: 2 PID: 376 Comm: modprobe Tainted:
> G      D            6.7.0-rc1 #2
> Nov 17 21:33:38 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00
> (DT)
> Nov 17 21:33:38 riscv64 kernel: epc : __kmem_cache_alloc_node+0x286/0x2fa
> Nov 17 21:33:38 riscv64 kernel:  ra : __kmem_cache_alloc_node+0x5a/0x2fa
> Nov 17 21:33:38 riscv64 kernel: epc : ffffffff802bacb8 ra : ffffffff802baa8c
> sp : ffffffd889f07940
> Nov 17 21:33:38 riscv64 kernel:  gp : ffffffff81c4bdd8 tp : ffffffd898d84380
> t0 : ffffffd88e1c3ae0
> Nov 17 21:33:38 riscv64 kernel:  t1 : 0000940000000000 t2 : 0000000000000000
> s0 : ffffffd889f079a0
> Nov 17 21:33:38 riscv64 kernel:  s1 : ffffffd880001700 a0 : 26856bdb7d944dce
> a1 : 0000000000000717
> Nov 17 21:33:38 riscv64 kernel:  a2 : 0000000000008500 a3 : ffffffff81cbb1e8
> a4 : 26856bdb7d944e0e
> Nov 17 21:33:38 riscv64 kernel:  a5 : 0e4e947ddb6b0026 a6 : 000000000000ff00
> a7 : 0000000000000718
> Nov 17 21:33:38 riscv64 kernel:  s2 : 0000000000000000 s3 : 0000000000000dc0
> s4 : 0000000000000080
> Nov 17 21:33:38 riscv64 kernel:  s5 : ffffffffffffffff s6 : ffffffff80466e7a
> s7 : ffffffff81c4c454
> Nov 17 21:33:38 riscv64 kernel:  s8 : 0000000000000dc0 s9 : ffffffff024528c8
> s10: ffffffff81b1b7d0
> Nov 17 21:33:38 riscv64 kernel:  s11: ffffffff81b1b880 t3 : 0000000000000000
> t4 : 0000000000000000
> Nov 17 21:33:38 riscv64 kernel:  t5 : 0000000000000000 t6 : ffffffd880d57554
> Nov 17 21:33:38 riscv64 kernel: status: 0000000200000120 badaddr:
> 0000005b7d944e0e cause: 000000000000000d
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802bacb8>]
> __kmem_cache_alloc_node+0x286/0x2fa
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80247cac>] kmalloc_trace+0x30/0xac
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80466e7a>]
> eventfs_create_dir+0x46/0x158
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff8015c968>]
> event_create_dir+0xac/0x2e0
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff8015dcdc>]
> trace_module_notify+0x1d8/0x264
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80049e28>]
> notifier_call_chain+0x6c/0xe8
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80049f2c>]
> blocking_notifier_call_chain_robust+0x5a/0xc2
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4390>]
> load_module+0x16dc/0x1d1a
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4bfc>]
> init_module_from_file+0x82/0xc4
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4dda>]
> __riscv_sys_finit_module+0x19c/0x33a
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4eed4>]
> do_trap_ecall_u+0x4e/0x13a
> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c5a716>]
> ret_from_exception+0x0/0x66
> Nov 17 21:33:38 riscv64 kernel: Code: 0813 f008 5613 0287 e7b3 0117 7633
> 0106 8893 0015 (6318) 8fd1
> Nov 17 21:33:38 riscv64 kernel: ---[ end trace 0000000000000000 ]---
> Nov 17 21:33:38 riscv64 systemd[1]: Finished Coldplug All udev Devices.
> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
> virtual address 0000005b7d944e0e
> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
> virtual address ffffff81e50b0eb6
> Nov 17 21:33:38 riscv64 kernel: Oops [#3]
> 

The list structure that was holding the relocation data was not being
freed correctly, causing some accesses after free. I just sent out a
patch fixing that (riscv: Safely remove entries from relocation list).
Perhaps that will solve this issue.

- Charlie




More information about the linux-riscv mailing list