[RISC-V] [tech-j-ext] [RFC PATCH 5/9] riscv: Split per-CPU and per-thread envcfg bits

Thu Mar 21 20:43:22 PDT 2024

Hi Deepak,

On 2024-03-20 6:27 PM, Deepak Gupta wrote:
>>>> And instead of context switching in `_switch_to`,
>>>> In `entry.S` pick up `envcfg` from `thread_info` and write it into CSR.
>>>
>>> The immediate reason is that writing envcfg in ret_from_exception() adds cycles
>>> to every IRQ and system call exit, even though most of them will not change the
>>> envcfg value. This is especially the case when returning from an IRQ/exception
>>> back to S-mode, since envcfg has zero effect there.
>>>
>>> The CSRs that are read/written in entry.S are generally those where the value
>>> can be updated by hardware, as part of taking an exception. But envcfg never
>>> changes on its own. The kernel knows exactly when its value will change, and
>>> those places are:
>>>
>>>  1) Task switch, i.e. switch_to()
>>>  2) execve(), i.e. start_thread() or flush_thread()
>>>  3) A system call that specifically affects a feature controlled by envcfg
>>
>> Yeah I was optimizing for a single place to write instead of
>> sprinkling at multiple places.
>> But I see your argument. That's fine.
>>
> 
> Because this is RFC and we are discussing it. I thought a little bit
> more about this.

Thanks for your comments and the discussion! I know several in-progress features
depend on envcfg, so hopefully we can agree on a design acceptable to everyone.

> If we were to go with the above approach that essentially requires
> whenever a envcfg bit changes, `sync_envcfg`
> has to be called to reflect the correct value.

sync_envcfg() is only needed if the task being updated is `current`. Would it be
more acceptable if this happened inside a helper function? Something like:

static inline void envcfg_update_bits(struct task_struct *task,
				      unsigned long mask, unsigned long val)
{
	unsigned long envcfg;

	envcfg = (task->thread.envcfg & ~mask) | val;
	task->thread.envcfg = envcfg;
	if (task == current)
		csr_write(CSR_ENVCFG, this_cpu_read(riscv_cpu_envcfg) | envcfg);
}

> What if some of these features enable/disable are exposed to `ptrace`
> (gdb, etc use cases) for enable/disable.
> How will syncing work then ?

ptrace_check_attach() ensures the tracee is scheduled out while a ptrace
operation is running, so there is no need to sync anything. Any changes to
task->thread.envcfg are written to the CSR when the tracee is scheduled back in.

> I can see the reasoning behind saving some cycles during trap return.
> But `senvcfg` is not actually a user state, it
> controls the execution environment configuration for user mode. I
> think the best place for this CSR to be written is
> trap return and writing at a single place from a single image on stack
> reduces chances of bugs and errors. And allows
> `senvcfg` features to be exposed to other kernel flows (like `ptrace`)

If ptrace is accessing a process, then task->thread.envcfg is always up to date.
The only complication is that the per-CPU bits need to be ORed back in to get
the real CSR value for another process, but this again is unrelated to whether
the CSR is written in switch_to() or ret_from_exception().

> We can figure out ways on how to optimize in trap return path to avoid
> writing it if we entered and exiting on the same
> task.

Optimizing out the CSR write when the task did not switch requires knowing if
the current task's envcfg was changed during this trip to S-mode... and this
starts looking similar to sync_envcfg().

Regards,
Samuel