[PATCH v2 1/3] RISC-V: Add kexec support
Nick Kossifidis
mick at ics.forth.gr
Thu Jul 16 10:57:04 EDT 2020
Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> On Tue, 23 Jun 2020 08:05:10 PDT (-0700), mick at ics.forth.gr wrote:
>> +
>> +config KEXEC
>> + bool "Kexec system call"
>> + select KEXEC_CORE
>> + select HOTPLUG_CPU if SMP
>
> This needs an S-mode dependency, as this definately won't run as-is in
> M mode.
> While we might be fix up the CSRs pretty quickly, but as we don't
> really have
> any standard-smelling M-mode platforms
>
Good point, I'll add a dependency on MMU for now and work on supporting
NOMMU platforms at a later stage.
>> + /*
>> + * When we switch SATP.MODE to "Bare" we'll only
>> + * play with physical addresses. However the first time
>> + * we try to jump somewhere, the offset on the jump
>> + * will be relative to pc which will still be on VA. To
>> + * deal with this we set stvec to the physical address at
>> + * the start of the loop below so that we jump there in
>> + * any case.
>> + */
>> + la s8, 1f
>> + sub s8, s8, s7
>> + csrw stvec, s8
>
> stvec needs to be aligned.
>
I thought the compiler would align the address of 1: since it's a label.
Would an ".align 8" after the label do the trick ?
>> +
>> + /* Process entries in a loop */
>> +1:
>> + addi s10, s10, 1
>> + REG_L t0, 0(s0) /* t0 = *image->entry */
>> + addi s0, s0, RISCV_SZPTR /* image->entry++ */
>> +
>> + /* IND_DESTINATION entry ? -> save destination address */
>> + andi t1, t0, 0x1
>> + beqz t1, 2f
>> + andi s4, t0, ~0x1
>> + j 1b
>> +
>> +2:
>> + /* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
>> + andi t1, t0, 0x2
>> + beqz t1, 2f
>> + andi s0, t0, ~0x2
>> + addi s9, s9, 1
>> + csrw sptbr, zero
>
> If I understand correctly (it's ambiguous in the ISA right now), we
> don't need
> a fence here. I've opened
> https://github.com/riscv/riscv-isa-manual/issues/538
> for clarification.
>
Thanks, that was also my understanding since we don't modify the page
table but we just skip it.
>> +4:
>> + /* Wait for the relocation to be visible by other harts */
>> + fence w,w
>
> That's not how fences work. A "fence w,w" just ensures that prior
> writes are
> visible before subsequent writes. Usually that would be some sort of
> control
> write being ordered after the data writes, which on the other harts
> would be
> paired with a "fence r,r" to make sure the control read is seen before
> any
> other reads. I don't see that, though.
>
> I think the right answer here is to ignore ordering here, as we're
> operating in
> a single processor mode, and instead push the fences into the CPU
> hotplug up
> codepath. We have to handle that anyway, so we might as well just do
> it once.
>
Good catch, that line was a leftover from debuging CPU_HOTPLUG, I'll
clean it up. The fence.i later on is the important one.
>> + /* Copy the assembler code for relocation to the control page */
>> + control_code_buffer = page_address(image->control_code_page);
>> + memcpy(control_code_buffer, riscv_kexec_relocate,
>> + riscv_kexec_relocate_size);
>
> The toolchain doesn't make any guarantees that you can move code around
> without
> -fPIC, but we're already taking advantage of the fact that simply
> medany code
> can be moved around so this should be OK. This is all assembly and it
> looks
> fine, but since it's placed in the rodata segment there's at least a
> reasonably possibility that GP could end up close enough that things
> get
> relaxed. We'd be safer adding .norelax to that assembly file, as
> there's
> nothing in there that's performance critical.
>
Good point, I assumed PC-relative addressing. Should I also add medany
to the CFLAGS of machine_kexec.c or add a dependency on CMODEL_MEDANY ?
> Also, as far as I can tell nothing is checking that
> riscv_kexec_relocate_size
> is smaller than a page. That could cause overflows in this memcpy, but
> a
> static check should be sufficient.
>
A page is huge compared to the relocation code plus we already know its
size at compile time so I thought it was an overkill to check it at
runtime every time. I'll see if I can do something with the preprocessor
there or I'll add a runtime check just in case.
>> +
>> + /* Mark the control page executable */
>> + set_memory_x((unsigned long) control_code_buffer, 1);
>> +
>> +#ifdef CONFIG_SMP
>> + /*
>> + * Make sure other harts see the copied data
>> + * if they try to read the control buffer
>> + */
>> + smp_wmb();
>> +#endif
>
> This appears to have similar issues as the fence.
>
> Also, smp_wmb() already has the relevant CONFIG_SMP checks.
>
I thought it would be safer to add a write memory barrier here, on the
other hand this code is executed upon loading the image so it doesn't
make much sense, I'll clean it up.
>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 05669c87a..778dc191c 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -42,6 +42,7 @@
>> #define KEXEC_ARCH_MIPS_LE (10 << 16)
>> #define KEXEC_ARCH_MIPS ( 8 << 16)
>> #define KEXEC_ARCH_AARCH64 (183 << 16)
>> +#define KEXEC_ARCH_RISCV (243 << 16)
>
> I usually split the UAPI changes out as their own patch. This one is
> pretty
> trivial, but it's always nice to make sure everyone knows we're making
> a UAPI
> change just to be on the safe side.
>
Good point, the reason I put this here is so that the patch is complete
and compiles fine (we include this on include/asm/kexec.h).
More information about the linux-riscv
mailing list