[PATCH] riscv: Fix vector state restore in rt_sigreturn()

Vineet Gupta vineetg at rivosinc.com
Wed Apr 3 10:33:10 PDT 2024


On 4/3/24 00:26, Björn Töpel wrote:
> From: Björn Töpel <bjorn at rivosinc.com>
>
> The RISC-V Vector specification states in "Appendix D: Calling
> Convention for Vector State" [1] that "Executing a system call causes
> all caller-saved vector registers (v0-v31, vl, vtype) and vstart to
> become unspecified.". In the RISC-V kernel this is called "discarding
> the vstate".
>
> Returning from a signal handler via the rt_sigreturn() syscall, vector
> discard is also performed. However, this is not an issue since the
> vector state should be restored from the sigcontext, and therefore not
> care about the vector discard.
>
> The "live state" is the actual vector register in the running context,
> and the "vstate" is the vector state of the task. A dirty live state,
> means that the vstate and live state are not in synch.
>
> When vectorized user_from_copy() was introduced, an bug sneaked in at
> the restoration code, related to the discard of the live state.
>
> An example when this go wrong:
>
>   1. A userland application is executing vector code
>   2. The application receives a signal, and the signal handler is
>      entered.
>   3. The application returns from the signal handler, using the
>      rt_sigreturn() syscall.
>   4. The live vector state is discarded upon entering the
>      rt_sigreturn(), and the live state is marked as "dirty", indicating
>      that the live state need to be synchronized with the current
>      vstate.
>   5. rt_sigreturn() restores the vstate, except the Vector registers,
>      from the sigcontext
>   6. rt_sigreturn() restores the Vector registers, from the sigcontext,
>      and now the vectorized user_from_copy() is used. The dirty live
>      state from the discard is saved to the vstate, making the vstate
>      corrupt.
>   7. rt_sigreturn() returns to the application, which crashes due to
>      corrupted vstate.
>
> Note that the vectorized user_from_copy() is invoked depending on the
> value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which
> means that vlen has to be larger than 128b for this bug to trigger.
>
> The fix is simply to mark the live state as non-dirty/clean prior
> performing the vstate restore.
>
> Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1]
> Reported-by: Charlie Jenkins <charlie at rivosinc.com>
> Reported-by: Vineet Gupta <vgupta at kernel.org>
> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user")
> Signed-off-by: Björn Töpel <bjorn at rivosinc.com>

Tested-by: Vineet Gupta <vineetg at rivosinc.com>

For completeness (and fun)

1. The issue was triggered on dual core spike run with a seemingly
benign workload (the key is repeated fork/execve/exit with a little I/O)

    some-shell-script.sh

    #!/bin/bash

    (while true; do ls; done) &

    for i in $seq (1 20); do
       <long running job>
    done

2. The issue initially appears as follows: Vector store instruction,
before starting to run invalidates it's own context (page fault ->
preemption -> handle-signal -> sigreturn -> VILL / V-clobber), so when
it eventually runs, it takes an illegal instruction exception, taking
down the entire program.

Thx,
-Vineet
   



More information about the linux-riscv mailing list