RISC-V: patched kexec-tools on github for review/testing

Nick Kossifidis mick at ics.forth.gr
Thu Nov 25 15:43:06 PST 2021


Στις 2021-10-15 10:07, Alexandre Ghiti έγραψε:
> On Sat, Oct 9, 2021 at 3:25 PM Nick Kossifidis <mick at ics.forth.gr> 
> wrote:
>> 
>> Στις 2021-10-06 14:10, Alexandre Ghiti έγραψε:
>> >
>> > So I followed the instructions here:
>> > https://documentation.suse.com/fr-fr/sles/12-SP3/html/SLES-all/cha-tuning-kexec.html#cha-tuning-kexec-basic-usage,
>> > below the output on an Unmatched board using a vmlinux stored on a sd
>> > card:
>> >
>> > ubuntu at ubuntu:~$ sudo sbin/kexec -l vmlinux --append="$(cat
>> > /proc/cmdline)" --initrd=/boot/initrd.img
>> > Warning: No cmdline provided, using append string as cmdline
>> > Warning: No dtb provided, using /sys/firmware/fdt
>> > [ 1813.472671] INFO: task kworker/1:0:988 blocked for more than 120
>> > seconds.
>> > [ 1813.478751]       Not tainted 5.15.0-rc1+ #15
>> > [ 1813.483110] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> > disables this message.
>> > Could not find a free area of memory of 0x3000 bytes...
>> > locate_hole failed
>> >
>> > I used the Ubuntu kernel, so this is pretty large:
>> > -rwxrwxr-x 1 ubuntu ubuntu 277M Oct  5 15:47 vmlinux
>> > -rw-r--r-- 1 root root 98M Sep 21 03:25 /boot/initrd.img
>> >
>> 
>> ACK, I haven't tested initrd much TBH, I usually don't use an initrd,
>> and when I do it's a small busybox-based rootfs.
>> 
>> > Then if I don't load the initrd (I sometimes have the same warning as
>> > above) I can at least kexec the new kernel but it fails to boot:
>> >
>> > ubuntu at ubuntu:~$ sudo ./sbin/kexec -e
>> > Warning: No cmdline or append string provided
>> > Warning: No dtb provided, using /sys/firmware/fdt
>> > [...]
>> > [    0.000000] SBI v0.2 HSM extension detected
>> > [    0.000000] CPU with hartid=0 is not available
>> > [    0.000000] ------------[ cut here ]------------
>> > [    0.000000] kernel BUG at arch/riscv/kernel/smpboot.c:107!
>> > [    0.000000] Kernel BUG [#1]
>> > [    0.000000] Modules linked in:
>> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.15.0-rc1+ #15
>> > [    0.000000] Hardware name: SiFive HiFive Unmatched A00 (DT)
>> > [    0.000000] epc : setup_smp+0xcc/0x142
>> > [    0.000000]  ra : setup_smp+0xc4/0x142
>> > [    0.000000] epc : ffffffff80a04080 ra : ffffffff80a04078 sp :
>> > ffffffff81803ec0
>> > [    0.000000]  gp : ffffffff81a23220 tp : ffffffff81810500 t0 :
>> > ffffffff81a3551f
>> > [    0.000000]  t1 : ffffffffffffffff t2 : 0000000000000000 s0 :
>> > ffffffff81803f00
>> > [    0.000000]  s1 : 0000000000000000 a0 : 0000000000000000 a1 :
>> > 0000000000000000
>> > [    0.000000]  a2 : 0000000000000000 a3 : 0000000000000001 a4 :
>> > 0000000000000000
>> > [    0.000000]  a5 : ffffffff80c64500 a6 : 0000000000000004 a7 :
>> > 000000000000ff00
>> > [    0.000000]  s2 : 0000000000000005 s3 : 0000000000000000 s4 :
>> > ffffffff8118f9a8
>> > [    0.000000]  s5 : 0000000000000007 s6 : ffffffff80c0b790 s7 :
>> > 0000000080000200
>> > [    0.000000]  s8 : 0000000000000fff s9 : 0000000081000200 s10:
>> > 0000000000000018
>> > [    0.000000]  s11: 000000000000000b t3 : 0000000000ff0000 t4 :
>> > ffffffffffffffff
>> > [    0.000000]  t5 : ffffffff80c0b7a0 t6 : ffffffff81803bd8
>> > [    0.000000] status: 0000000200000100 badaddr: 0000000000000000
>> > cause: 0000000000000003
>> > [    0.000000] [<ffffffff80a04080>] setup_smp+0xcc/0x142
>> > [    0.000000] [<ffffffff80a03d88>] setup_arch+0x56a/0x590
>> > [    0.000000] [<ffffffff80a00aa2>] start_kernel+0xaa/0xa5c
>> > [    0.000000] random: get_random_bytes called from
>> > oops_exit+0x44/0x70 with crng_init=0
>> > [    0.000000] ---[ end trace 0000000000000000 ]---
>> > [    0.000000] Kernel panic - not syncing: Attempted to kill the idle
>> > task!
>> > [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill
>> > the idle task! ]---
>> >
>> > This reliably fails here.
>> >
>> 
>> This looks weird, I'll check it out (we have an unmatched here so I'll
>> try to get my hands on it sometime next week).
>> 
>> Did you try kdump ? Do you get the same error ?
> 
> kdump works fine, it fails to find the rootfs but I think my setup is
> faulty here.
> I took a quick look at kexec_relocate.S, and the use of va_pa_offset
> is also wrong here, we should use va_kernel_pa_offset as it is used to
> modify a text address But fixing that did not work either.
> 
> Alex
> 

Sorry for the delay, I finally got some time to work on this. As it 
turns out I'm passing cpu id instead of hart id to the next kernel (that 
comment on smp.h claiming that raw_smp_processor_id returns the hart id 
didn't help much) and interestingly enough cpu id and hart id match on 
qemu and also sometimes match on the unmatched / unleashed board. Also 
on unmatched / unleashed, hart id 0 is used by the non-linux hart so if 
we pass the cpu id of the boot cpu that's always going to be 0 it'll be 
invalid, but that's not always the case, especially for kdump. I'm also 
getting an error when trying to mount the rootfs on kdump on the 
unmatched board, because swiotlb can't allocate bounce buffers and the 
pcie driver doesn't work, so no nvme access. I'm looking for a way to 
make this work without messing things up, in any case I'll send some 
patches over the weekend. For the initrd issue I need to patch 
kexec-tools.

Regards,
Nick



More information about the kexec mailing list