[PATCH v2 1/3] RISC-V: Add kexec support

Thu Jul 16 10:57:04 EDT 2020

Στις 2020-07-11 06:59, Palmer Dabbelt έγραψε:
> On Tue, 23 Jun 2020 08:05:10 PDT (-0700), mick at ics.forth.gr wrote:
>> +
>> +config KEXEC
>> +	bool "Kexec system call"
>> +	select KEXEC_CORE
>> +	select HOTPLUG_CPU if SMP
> 
> This needs an S-mode dependency, as this definately won't run as-is in 
> M mode.
> While we might be fix up the CSRs pretty quickly, but as we don't 
> really have
> any standard-smelling M-mode platforms
> 

Good point, I'll add a dependency on MMU for now and work on supporting 
NOMMU platforms at a later stage.

>> +	/*
>> +	 * When we switch SATP.MODE to "Bare" we'll only
>> +	 * play with physical addresses. However the first time
>> +	 * we try to jump somewhere, the offset on the jump
>> +	 * will be relative to pc which will still be on VA. To
>> +	 * deal with this we set stvec to the physical address at
>> +	 * the start of the loop below so that we jump there in
>> +	 * any case.
>> +	 */
>> +	la	s8, 1f
>> +	sub	s8, s8, s7
>> +	csrw	stvec, s8
> 
> stvec needs to be aligned.
> 

I thought the compiler would align the address of 1: since it's a label. 
Would an ".align 8" after the label do the trick ?

>> +
>> +	/* Process entries in a loop */
>> +1:
>> +	addi	s10, s10, 1
>> +	REG_L	t0, 0(s0)		/* t0 = *image->entry */
>> +	addi	s0, s0, RISCV_SZPTR	/* image->entry++ */
>> +
>> +	/* IND_DESTINATION entry ? -> save destination address */
>> +	andi	t1, t0, 0x1
>> +	beqz	t1, 2f
>> +	andi	s4, t0, ~0x1
>> +	j	1b
>> +
>> +2:
>> +	/* IND_INDIRECTION entry ? -> update next entry ptr (PA) */
>> +	andi	t1, t0, 0x2
>> +	beqz	t1, 2f
>> +	andi	s0, t0, ~0x2
>> +	addi	s9, s9, 1
>> +	csrw	sptbr, zero
> 
> If I understand correctly (it's ambiguous in the ISA right now), we 
> don't need
> a fence here.  I've opened 
> https://github.com/riscv/riscv-isa-manual/issues/538
> for clarification.
> 

Thanks, that was also my understanding since we don't modify the page 
table but we just skip it.

>> +4:
>> +	/* Wait for the relocation to be visible by other harts */
>> +	fence	w,w
> 
> That's not how fences work.  A "fence w,w" just ensures that prior 
> writes are
> visible before subsequent writes.  Usually that would be some sort of 
> control
> write being ordered after the data writes, which on the other harts 
> would be
> paired with a "fence r,r" to make sure the control read is seen before 
> any
> other reads.  I don't see that, though.
> 
> I think the right answer here is to ignore ordering here, as we're 
> operating in
> a single processor mode, and instead push the fences into the CPU 
> hotplug up
> codepath.  We have to handle that anyway, so we might as well just do 
> it once.
> 

Good catch, that line was a leftover from debuging CPU_HOTPLUG, I'll 
clean it up. The fence.i later on is the important one.

>> +	/* Copy the assembler code for relocation to the control page */
>> +	control_code_buffer = page_address(image->control_code_page);
>> +	memcpy(control_code_buffer, riscv_kexec_relocate,
>> +		riscv_kexec_relocate_size);
> 
> The toolchain doesn't make any guarantees that you can move code around 
> without
> -fPIC, but we're already taking advantage of the fact that simply 
> medany code
> can be moved around so this should be OK.  This is all assembly and it 
> looks
> fine, but since it's placed in the rodata segment there's at least a
> reasonably possibility that GP could end up close enough that things 
> get
> relaxed.  We'd be safer adding .norelax to that assembly file, as 
> there's
> nothing in there that's performance critical.
> 

Good point, I assumed PC-relative addressing. Should I also add medany 
to the CFLAGS of machine_kexec.c or add a dependency on CMODEL_MEDANY ?

> Also, as far as I can tell nothing is checking that 
> riscv_kexec_relocate_size
> is smaller than a page.  That could cause overflows in this memcpy, but 
> a
> static check should be sufficient.
> 

A page is huge compared to the relocation code plus we already know its 
size at compile time so I thought it was an overkill to check it at 
runtime every time. I'll see if I can do something with the preprocessor 
there or I'll add a runtime check just in case.

>> +
>> +	/* Mark the control page executable */
>> +	set_memory_x((unsigned long) control_code_buffer, 1);
>> +
>> +#ifdef CONFIG_SMP
>> +	/*
>> +	 * Make sure other harts see the copied data
>> +	 * if they try to read the control buffer
>> +	 */
>> +	smp_wmb();
>> +#endif
> 
> This appears to have similar issues as the fence.
> 
> Also, smp_wmb() already has the relevant CONFIG_SMP checks.
> 

I thought it would be safer to add a write memory barrier here, on the 
other hand this code is executed upon loading the image so it doesn't 
make much sense, I'll clean it up.

>> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
>> index 05669c87a..778dc191c 100644
>> --- a/include/uapi/linux/kexec.h
>> +++ b/include/uapi/linux/kexec.h
>> @@ -42,6 +42,7 @@
>>  #define KEXEC_ARCH_MIPS_LE (10 << 16)
>>  #define KEXEC_ARCH_MIPS    ( 8 << 16)
>>  #define KEXEC_ARCH_AARCH64 (183 << 16)
>> +#define KEXEC_ARCH_RISCV   (243 << 16)
> 
> I usually split the UAPI changes out as their own patch.  This one is 
> pretty
> trivial, but it's always nice to make sure everyone knows we're making 
> a UAPI
> change just to be on the safe side.
> 

Good point, the reason I put this here is so that the patch is complete 
and compiles fine (we include this on include/asm/kexec.h).