[PATCH v6 1/5] riscv: save the SR_SUM status over switches

Thu May 22 07:49:27 PDT 2025

On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>On 20/05/2025 17:49, Deepak Gupta wrote:
>>I did give this patch my RB and had planned to come back to it to see
>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>attention again. As it stands today, it doesn't impact cfi related
>>changes but I've some concerns.
>>
>>Overall I do agree we should reduce number of SSTATUS accesses.
>>
>>Couple of questions on introducing new `sstatus` field (inline)
>>
>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>From: Ben Dooks <ben.dooks at codethink.co.uk>
>>>>
>>>>When threads/tasks are switched we need to ensure the old execution's
>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>restored.
>>>>
>>>>The issue was seen under heavy load especially with the syz-stress tool
>>>>running, with crashes as follows in schedule_tail:
>>>>
>>>>Unable to handle kernel access to user memory without uaccess routines
>>>>at virtual address 000000002749f0d0
>>>>Oops [#1]
>>>>Modules linked in:
>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>000000000000000f
>>>>Call Trace:
>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>Dumping ftrace buffer:
>>>> (ftrace buffer empty)
>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>
>>>>The issue comes from the put_user() in schedule_tail
>>>>(kernel/sched/core.c) doing the following:
>>>>
>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>{
>>>>...
>>>>      if (current->set_child_tid)
>>>>              put_user(task_pid_vnr(current), current->set_child_tid);
>>>>...
>>>>}
>>>>
>>>>the put_user() macro causes the code sequence to come out as follows:
>>>>
>>>>1:    __enable_user_access()
>>>>2:    reg = task_pid_vnr(current);
>>>>3:    *current->set_child_tid = reg;
>>>>4:    __disable_user_access()
>>>>
>>>>The problem is that we may have a sleeping function as argument which
>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>evaluating the argument of the put_user() macro outside the user-enabled
>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>enabling user access")"
>>>>
>>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>>to avoid the same issue we had with put_user() and sleeping functions we
>>>>must ensure code flow can go through switch_to() from within a region of
>>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>patch addresses the problem allowing future work to enable full use of
>>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>
>>>>Reported-by: syzbot+e74b94fe601ab9552d69 at syzkaller.appspotmail.com
>>>>Signed-off-by: Ben Dooks <ben.dooks at codethink.co.uk>
>>>>Signed-off-by: Cyril Bur <cyrilbur at tenstorrent.com>
>>>>---
>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>3 files changed, 14 insertions(+)
>>>>
>>>>diff --git a/arch/riscv/include/asm/processor.h 
>>>>b/arch/riscv/include/ asm/processor.h
>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>--- a/arch/riscv/include/asm/processor.h
>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>    struct __riscv_d_ext_state fstate;
>>>>    unsigned long bad_cause;
>>>>    unsigned long envcfg;
>>>>+    unsigned long status;
>>
>>Do we really need a new member field in `thread_struct`. We already have
>>`sstatus` in `pt_regs` which reflects overall execution environment 
>>situation
>>for current thread. This gets saved and restored on trap entry and exit.
>>
>>If we put `status` in `thread_struct` it creates ambiguity in terms 
>>of which
>>`status` to save to and pick from from future maintainibility 
>>purposes as the
>>fields get introduced to this CSR.
>>
>>Why can't we access current trap frame's `sstatus` image in 
>>`__switch_to` to
>>save and restore?
>>
>>Let me know if I am missing something obvious here. If there is a 
>>complication,
>>I am missing here and we do end up using this member field, I would 
>>rename it
>>to something like `status_kernel` to reflect that. So that future 
>>changes are
>>cognizant of the fact that we have split `status`. One for kernel 
>>execution env
>>per thread and one for controlling user execution env per thread.
>
>This is so long ago now I cannot remember if there was any sstatus in
>the pt_regs field, 

FS/VS bits encode status of floating point and vector on per-thread basis.
So `status` has been part of `pt_regs` for quite a while. 

> and if kernel threads have the same context as their
>userland parts.

I didn't mean kernel thread. What I meant was kernel execution environment
per-thread. A userland thread does spend sometime in kernel and kernel does
things on its behalf. One of those thing is touching user memory and that
requires mucking with this CSR. So what I meant was are we splitting `status`
on per-thread basis for their time spent in user and kernel.

Getting back to original question--
As I said, each thread spends sometime in user or in kernel. `status` in
`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
`status` field in `pt_regs` is reflecting execution status of the thread on per
trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
for today, certainly for future) of which `status` to pick from when we are
doing save/restore.

So my first question was why not to use `status` in `pt_regs`. It is granular
as it can get (it is available per thread context per trap basis). 

I did ask Alex as well. I'll ping him again.

>
>Does anyone else have any comment on this?
>
>>
>>>>    u32 riscv_v_flags;
>>>>    u32 vstate_ctrl;
>>>>    struct __riscv_v_ext_state vstate;
>>>>diff --git a/arch/riscv/kernel/asm-offsets.c 
>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>index 16490755304e..969c65b1fe41 100644
>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv at lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>
>
>-- 
>Ben Dooks				http://www.codethink.co.uk/
>Senior Engineer				Codethink - Providing Genius
>
>https://www.codethink.co.uk/privacy.html