RISC-V regression on Linux 6.7-rc1

Ron Economos re at w6rz.net
Mon Nov 20 10:52:20 PST 2023


On 11/20/23 9:39 AM, Charlie Jenkins wrote:
> On Mon, Nov 20, 2023 at 07:54:29AM -0800, Ron Economos wrote:
>> Linux 6.7-rc1 fails to boot on the HiFive Unmatched running Ubuntu 22.04.
>> During boot, there are many random oops and kernel panics. Reverting the
>> patch series "riscv: Add remaining module relocations and tests" (commit
>> b51fc88cb35e49) resolves the issue.
>>
>> Here's an example, but it does something different on each boot.
>>
>> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
>> virtual address ffffff97e6cb45fa
>> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
>> virtual address 0000005b7d944e0e
>> Nov 17 21:33:38 riscv64 kernel: Oops [#1]
>> Nov 17 21:33:38 riscv64 kernel: Modules linked in: sch_fq_codel auth_rpcgss
>> nfs_acl drm(+) lockd grace backlight sunrpc efi_pstore ip_tables x_tables
>> autofs4 btrfs blake2b_generic raid10 raid456 >
>> Nov 17 21:33:38 riscv64 kernel: CPU: 1 PID: 391 Comm: cloud-init Not tainted
>> 6.7.0-rc1 #2
>> Nov 17 21:33:38 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00
>> (DT)
>> Nov 17 21:33:38 riscv64 kernel: epc : refill_obj_stock+0x4e/0x160
>> Nov 17 21:33:38 riscv64 kernel:  ra : refill_obj_stock+0x4e/0x160
>> Nov 17 21:33:38 riscv64 kernel: epc : ffffffff802d9264 ra : ffffffff802d9264
>> sp : ffffffd898e2fd10
>> Nov 17 21:33:38 riscv64 kernel:  gp : ffffffff81c4bdd8 tp : ffffffd884f9ec00
>> t0 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  t1 : 0000000000000000 t2 : 0000000000000000
>> s0 : ffffffd898e2fd60
>> Nov 17 21:33:38 riscv64 kernel:  s1 : ffffffdbfed02f70 a0 : ffffffd89a2b1ec0
>> a1 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  a2 : 0000000000000000 a3 : 0000000000000000
>> a4 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  a5 : 0000000000000000 a6 : 0000000000000000
>> a7 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  s2 : 31413797e6cb45fa s3 : 0000000000000108
>> s4 : 0000000200000022
>> Nov 17 21:33:38 riscv64 kernel:  s5 : ffffffff81cbb1e8 s6 : ffffffd884f9ec00
>> s7 : 0000003fc0325028
>> Nov 17 21:33:38 riscv64 kernel:  s8 : 0000000000000000 s9 : 0000002b10fdc170
>> s10: 0000002b10fe2290
>> Nov 17 21:33:38 riscv64 kernel:  s11: 0000000000000000 t3 : 0000000000000000
>> t4 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  t5 : 0000000000000000 t6 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel: status: 0000000200000100 badaddr:
>> ffffff97e6cb45fa cause: 000000000000000d
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802d9264>]
>> refill_obj_stock+0x4e/0x160
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802dce26>]
>> obj_cgroup_uncharge+0x1c/0x2a
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802b97a6>]
>> kmem_cache_free+0x1b2/0x548
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802f358c>] __fput+0x132/0x252
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802f3702>] ____fput+0x18/0x22
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80043e96>] task_work_run+0xa8/0xee
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800bfc66>]
>> exit_to_user_mode_loop.isra.0+0xf2/0x10e
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4f2ac>]
>> syscall_exit_to_user_mode+0x54/0x64
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4eee0>]
>> do_trap_ecall_u+0x5a/0x13a
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c5a716>]
>> ret_from_exception+0x0/0x66
>> Nov 17 21:33:38 riscv64 kernel: Code: 639c 94be 689c 8963 0aa7 8526 f097
>> ffff 80e7 26a0 (3783) 0009
>> Nov 17 21:33:38 riscv64 kernel: ---[ end trace 0000000000000000 ]---
>> Nov 17 21:33:38 riscv64 kernel: Oops [#2]
>> Nov 17 21:33:38 riscv64 kernel: note: cloud-init[391] exited with irqs
>> disabled
>> Nov 17 21:33:38 riscv64 kernel: Modules linked in: sch_fq_codel auth_rpcgss
>> nfs_acl drm(+) lockd grace backlight sunrpc efi_pstore ip_tables x_tables
>> autofs4 btrfs blake2b_generic raid10 raid456 >
>> Nov 17 21:33:38 riscv64 kernel: CPU: 2 PID: 376 Comm: modprobe Tainted:
>> G      D            6.7.0-rc1 #2
>> Nov 17 21:33:38 riscv64 kernel: Hardware name: SiFive HiFive Unmatched A00
>> (DT)
>> Nov 17 21:33:38 riscv64 kernel: epc : __kmem_cache_alloc_node+0x286/0x2fa
>> Nov 17 21:33:38 riscv64 kernel:  ra : __kmem_cache_alloc_node+0x5a/0x2fa
>> Nov 17 21:33:38 riscv64 kernel: epc : ffffffff802bacb8 ra : ffffffff802baa8c
>> sp : ffffffd889f07940
>> Nov 17 21:33:38 riscv64 kernel:  gp : ffffffff81c4bdd8 tp : ffffffd898d84380
>> t0 : ffffffd88e1c3ae0
>> Nov 17 21:33:38 riscv64 kernel:  t1 : 0000940000000000 t2 : 0000000000000000
>> s0 : ffffffd889f079a0
>> Nov 17 21:33:38 riscv64 kernel:  s1 : ffffffd880001700 a0 : 26856bdb7d944dce
>> a1 : 0000000000000717
>> Nov 17 21:33:38 riscv64 kernel:  a2 : 0000000000008500 a3 : ffffffff81cbb1e8
>> a4 : 26856bdb7d944e0e
>> Nov 17 21:33:38 riscv64 kernel:  a5 : 0e4e947ddb6b0026 a6 : 000000000000ff00
>> a7 : 0000000000000718
>> Nov 17 21:33:38 riscv64 kernel:  s2 : 0000000000000000 s3 : 0000000000000dc0
>> s4 : 0000000000000080
>> Nov 17 21:33:38 riscv64 kernel:  s5 : ffffffffffffffff s6 : ffffffff80466e7a
>> s7 : ffffffff81c4c454
>> Nov 17 21:33:38 riscv64 kernel:  s8 : 0000000000000dc0 s9 : ffffffff024528c8
>> s10: ffffffff81b1b7d0
>> Nov 17 21:33:38 riscv64 kernel:  s11: ffffffff81b1b880 t3 : 0000000000000000
>> t4 : 0000000000000000
>> Nov 17 21:33:38 riscv64 kernel:  t5 : 0000000000000000 t6 : ffffffd880d57554
>> Nov 17 21:33:38 riscv64 kernel: status: 0000000200000120 badaddr:
>> 0000005b7d944e0e cause: 000000000000000d
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff802bacb8>]
>> __kmem_cache_alloc_node+0x286/0x2fa
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80247cac>] kmalloc_trace+0x30/0xac
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80466e7a>]
>> eventfs_create_dir+0x46/0x158
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff8015c968>]
>> event_create_dir+0xac/0x2e0
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff8015dcdc>]
>> trace_module_notify+0x1d8/0x264
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80049e28>]
>> notifier_call_chain+0x6c/0xe8
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80049f2c>]
>> blocking_notifier_call_chain_robust+0x5a/0xc2
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4390>]
>> load_module+0x16dc/0x1d1a
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4bfc>]
>> init_module_from_file+0x82/0xc4
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff800c4dda>]
>> __riscv_sys_finit_module+0x19c/0x33a
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c4eed4>]
>> do_trap_ecall_u+0x4e/0x13a
>> Nov 17 21:33:38 riscv64 kernel: [<ffffffff80c5a716>]
>> ret_from_exception+0x0/0x66
>> Nov 17 21:33:38 riscv64 kernel: Code: 0813 f008 5613 0287 e7b3 0117 7633
>> 0106 8893 0015 (6318) 8fd1
>> Nov 17 21:33:38 riscv64 kernel: ---[ end trace 0000000000000000 ]---
>> Nov 17 21:33:38 riscv64 systemd[1]: Finished Coldplug All udev Devices.
>> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
>> virtual address 0000005b7d944e0e
>> Nov 17 21:33:38 riscv64 kernel: Unable to handle kernel paging request at
>> virtual address ffffff81e50b0eb6
>> Nov 17 21:33:38 riscv64 kernel: Oops [#3]
>>
> The list structure that was holding the relocation data was not being
> freed correctly, causing some accesses after free. I just sent out a
> patch fixing that (riscv: Safely remove entries from relocation list).
> Perhaps that will solve this issue.
>
> - Charlie
>
Seems better, but still having problems. I've seen this 
"process_accumulated_relocations" oops twice though.

Nov 20 10:45:10 riscv64 kernel: Unable to handle kernel access to user 
memory without uaccess routines at virtual address 0000000000646572
Nov 20 10:45:10 riscv64 kernel: Oops [#1]
Nov 20 10:45:10 riscv64 kernel: Modules linked in: sunrpc(+) efi_pstore 
backlight ip_tables x_tables autofs4 btrfs blake2b_generic raid10 
raid456 async_raid6_recov async_memcpy async_pq >
Nov 20 10:45:10 riscv64 kernel: RPC: Registered named UNIX socket 
transport module.
Nov 20 10:45:10 riscv64 kernel: CPU: 2 PID: 375 Comm: modprobe Not 
tainted 6.7.0-rc2 #2
Nov 20 10:45:10 riscv64 kernel: Hardware name: SiFive HiFive Unmatched 
A00 (DT)
Nov 20 10:45:10 riscv64 kernel: epc : 
process_accumulated_relocations+0x58/0x150
Nov 20 10:45:10 riscv64 kernel:  ra : apply_relocate_add+0x12a/0x2f8
Nov 20 10:45:10 riscv64 kernel: epc : ffffffff8000a73c ra : 
ffffffff8000ac10 sp : ffffffd89a8abaa0
Nov 20 10:45:10 riscv64 kernel:  gp : ffffffff81c4be10 tp : 
ffffffd89b438000 t0 : 0000000000000017
Nov 20 10:45:10 riscv64 kernel:  t1 : ffffffff02413fb8 t2 : 
0000000000002e16 s0 : ffffffd89a8abb20
Nov 20 10:45:10 riscv64 kernel:  s1 : ffffffc8048b3080 a0 : 
ffffffff0241b900 a1 : 0000000000000000
Nov 20 10:45:10 riscv64 kernel:  a2 : 0000000000000000 a3 : 
0000000000000000 a4 : ffffffff81c5d140
Nov 20 10:45:10 riscv64 kernel:  a5 : 0000000000646572 a6 : 
0000000000000013 a7 : ffffffff81201900
Nov 20 10:45:10 riscv64 kernel: RPC: Registered udp transport module.
Nov 20 10:45:10 riscv64 kernel:  s2 : ffffffc80494c6c8 s3 : 
ffffffc80492d908 s4 : 0000000000000000
Nov 20 10:45:10 riscv64 kernel:  s5 : ffffffff0241b900 s6 : 
ffffffff812017d0 s7 : ffffffc8048d13b8
Nov 20 10:45:10 riscv64 kernel:  s8 : ffffffc80494b688 s9 : 
ffffffd884eb8ac0 s10: 6e655f6568636163
Nov 20 10:45:10 riscv64 kernel: RPC: Registered tcp transport module.
Nov 20 10:45:10 riscv64 kernel:  s11: ffffffff81b081c0 t3 : 
ffffffff80c2e7da t4 : ffffffff8000a32c
Nov 20 10:45:10 riscv64 kernel:  t5 : 0000000000003d8e t6 : 0000000000003d8c
Nov 20 10:45:10 riscv64 kernel: status: 0000000200000120 badaddr: 
0000000000646572 cause: 000000000000000d
Nov 20 10:45:10 riscv64 kernel: [<ffffffff8000a73c>] 
process_accumulated_relocations+0x58/0x150
Nov 20 10:45:10 riscv64 kernel: RPC: Registered tcp-with-tls transport 
module.
Nov 20 10:45:10 riscv64 kernel: [<ffffffff8000ac10>] 
apply_relocate_add+0x12a/0x2f8
Nov 20 10:45:10 riscv64 kernel: [<ffffffff800c41c0>] 
load_module+0x14f8/0x1d1a
Nov 20 10:45:10 riscv64 kernel: RPC: Registered tcp NFSv4.1 backchannel 
transport module.
Nov 20 10:45:10 riscv64 kernel: [<ffffffff800c4c10>] 
init_module_from_file+0x82/0xc4
Nov 20 10:45:10 riscv64 kernel: [<ffffffff800c4dee>] 
__riscv_sys_finit_module+0x19c/0x33a
Nov 20 10:45:10 riscv64 kernel: [<ffffffff80c4ef6c>] 
do_trap_ecall_u+0x4e/0x13a
Nov 20 10:45:10 riscv64 kernel: [<ffffffff80c5a7ae>] 
ret_from_exception+0x0/0x66
Nov 20 10:45:10 riscv64 kernel: Code: 0b13 0acb f4a6 f0ca ecce fc5e f862 
ec6e b783 010c (bb83) 0007
Nov 20 10:45:10 riscv64 kernel: ---[ end trace 0000000000000000 ]---




More information about the linux-riscv mailing list