[PATCH v3 0/4] riscv: optimize Vector context restore on syscall

Andy Chiu tchiu at tenstorrent.com
Thu May 21 14:09:40 PDT 2026


On Thu, May 21, 2026 at 12:15:07PM -0700, Olof Johansson wrote:
> Hi Andy,
> 
> On Thu, May 21, 2026 at 11:25:16AM -0500, Andy Chiu wrote:
> > This patch series optimizes riscv vector state handling across syscall
> > boundaries and context switches. The kernel now keeps track of the
> > INITIAL state in sstatus.vs to optimize unnecessary context management
> > operations.
> > 
> > This version merges daichengrong's RFC patch [1] for the state tracking
> > code as it looks cleaner than my v2/v1.
> > 
> > [1]: https://lore.kernel.org/linux-riscv/7ba2f4b7-8475-4ec3-ab31-58b332bda47e@iscas.ac.cn/#r
> > Link to v2: https://lore.kernel.org/linux-riscv/20260402043414.2421916-1-andybnac@gmail.com/
> 
> A patchset like this would be really helped by some kind of numbers in the
> cover letter to indicate how much performance moved, given a claim of
> optimization.
> 

Thanks for pointing it out, I totally agree with you. I had included a
test result on sifive's hardware in v2[1]. But that was on FPGA, I will
test it on a real silicon as soon as I have an access. Sorry for the
confusing claim here.

My test was running on a vector enabled version of lat_ctx. I modified
the main function to make sure the process touches vector, then run with
2 threads. Since lat_ctx uses syscall interface to notify another
process, the kernel will trash their vector registers instead of wasting
cycles on saving/restoring them.

> 
> Just for kicks I tried a simple microbenchmark for syscalls from
> a vector-enabled process:
> 
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> #include <time.h>
> #include <stdint.h>
> 
> static inline uint64_t ns_now(void) {
>     struct timespec t;
>     clock_gettime(CLOCK_MONOTONIC, &t);
>     return t.tv_sec * 1000000000ull + t.tv_nsec;
> }
> 
> int main(int argc, char **argv) {
>     int iters = argc > 1 ? atoi(argv[1]) : 10000000;
>     int use_v = argc > 2 ? atoi(argv[2]) : 1;
> 
>     if (use_v) {
>         asm volatile(
>             ".option push\n\t.option arch, +v\n\t"
>             "vsetivli x0, 1, e32, m1, ta, ma\n\t"
>             "vmv.v.i v0, 1\n\t"
>             ".option pop\n\t" ::: "memory");
>     }
> 
>     for (int i = 0; i < 10000; i++) syscall(SYS_getppid);  // warmup
> 
>     uint64_t t0 = ns_now();
>     for (int i = 0; i < iters; i++) syscall(SYS_getppid);
>     uint64_t t1 = ns_now();
> 
>     printf("V=%d %.1f ns/call (%lu ns / %d iters)\n",
>            use_v, (double)(t1 - t0) / iters, t1 - t0, iters);
>     return 0;
> }
> 
> 
> I compiled with gcc -O3, default GCC 14.2 on Debian 13. Host is x280
> (Blackhole). Base kernel sources is 7.1.0-rc4-next-20260520 defconfig. Ran
> with taskset to pin to one of the CPUs.
> 
> The testcase doesn't use vector inbetween each syscall, but will obviously
> have initiated the state (if started with '1' as second argument).
> 
> Without this patchset:
> V=1 242.9 ns/call (12144527848 ns / 50000000 iters)
> 
> With this patchset:
> V=1 264.5 ns/call (13226852900 ns / 50000000 iters)

This 9% regression is suprising to me as this patch set (without the fix
below) should be equivalent in performance on this code path (and better
if getpid takes one process switch).

Before the patch, nulling v resgisters happens at the entry point. After
this patchset we mark sstatus.vs to INITIAL at the entry and nulling
happens right before getting back to the user space.

> 
> Interestingly enough, with V=0 test it sped up slightly (194.3 -> 189.5 ns).
> 

This result is expected as with V=0 the kernel doesn't have to maintain
vstate at all. But we are also looking into ways to improve mode switch
latencies.

> I repeated the runs a few times, with similar results so I don't think it's
> explainable as noise.
> 

Thanks for carrying out the experiment, it's very sound! I actually
missed one thing on this patch for it to be optimized on this specific
case:

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 8c1e64e0dd0b..5d1282870a20 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -40,6 +40,15 @@
 	_res;								\
 })
 
+#define __riscv_v_vstate_check_gt(_val, TYPE) ({			\
+	bool _res;							\
+	if (has_xtheadvector()) 					\
+		_res = ((_val) & SR_VS_THEAD) > SR_VS_##TYPE##_THEAD;	\
+	else								\
+		_res = ((_val) & SR_VS) > SR_VS_##TYPE;			\
+	_res;								\
+})
+
 extern unsigned long riscv_v_vsize;
 int riscv_v_setup_vsize(void);
 bool insn_is_vector(u32 insn_buf);
@@ -323,7 +332,7 @@ static inline void riscv_v_vstate_set_restore(struct task_struct *task,
 
 static inline void riscv_v_vstate_discard(struct pt_regs *regs)
 {
-	if (riscv_v_vstate_query(regs)) {
+	if (__riscv_v_vstate_check_gt(regs->status, INITIAL)) {
 		riscv_v_vstate_set_restore(current, regs);
 		riscv_v_vstate_init(regs);
 	}

In this way if the user never touch vregs again then we will not null
out the context at syscall exit.

Again, I only have it functionally tested at the moment. I appreciate it
if you could get the number with the above diff. Meanwhile, I am going
souce and run on an actual hardware and hopefully find the reason for
the above regression, before rolling out v4.

> 
> Given that more code will be vector enabled in the new shiny RVA23 world
> we are entering, I'm uncertain whether this is the right trade-off. You won't
> get the syscall perf cost returned unless you need the vector context swapped
> in without the lazy fault between calls.
> 
> I suspect running userspace workloads on a RVA23 platform (SpaceMIT
> K3) with Ubuntu 26.04 would be the most meaningful data to collect. My
> ordered board is still in shipping, unfortunately.
> 
> 
> PS: There's a new build warning due to an unused 'uvstate' variable in
> riscv_v_start_kernel_context() that you might want to fix.
> 
> 
> -Olof

[1]: https://lore.kernel.org/linux-riscv/20260402043414.2421916-2-andybnac@gmail.com/

Thanks,
Andy



More information about the linux-riscv mailing list