[RFC PATCH v8 09/21] riscv: Add task switch support for vector

Vincent Chen vincent.chen at sifive.com
Thu Oct 21 20:52:01 PDT 2021


On Thu, Oct 21, 2021 at 6:50 PM Darius Rad <darius at bluespec.com> wrote:
>
> On Wed, Oct 20, 2021 at 06:01:31PM -0700, Paul Walmsley wrote:
> > Hello Darius,
> >
> > On Tue, 5 Oct 2021, Darius Rad wrote:
> >
> > > On Mon, Oct 04, 2021 at 08:36:30PM +0800, Greentime Hu wrote:
> > > > Darius Rad <darius at bluespec.com> 於 2021年9月29日 週三 下午9:28寫道:
> > > > >
> > > > > On Tue, Sep 28, 2021 at 10:56:52PM +0800, Greentime Hu wrote:
> > > > > > Darius Rad <darius at bluespec.com> 於 2021年9月13日 週一 下午8:21寫道:
> > > > > > >
> > > > > > > On 9/8/21 1:45 PM, Greentime Hu wrote:
> > > > > > > > This patch adds task switch support for vector. It supports partial lazy
> > > > > > > > save and restore mechanism. It also supports all lengths of vlen.
> >
> > [ ... ]
> >
> > > > > > > So this will unconditionally enable vector instructions, and allocate
> > > > > > > memory for vector state, for all processes, regardless of whether vector
> > > > > > > instructions are used?
> > > > > >
> > > > > > Yes, it will enable vector if has_vector() is true. The reason that we
> > > > > > choose to enable and allocate memory for user space program is because
> > > > > > we also implement some common functions in the glibc such as memcpy
> > > > > > vector version and it is called very often by every process. So that
> > > > > > we assume if the user program is running in a CPU with vector ISA
> > > > > > would like to use vector by default. If we disable it by default and
> > > > > > make it trigger the illegal instruction, that might be a burden since
> > > > > > almost every process will use vector glibc memcpy or something like
> > > > > > that.
> > > > >
> > > > > Do you have any evidence to support the assertion that almost every process
> > > > > would use vector operations?  One could easily argue that the converse is
> > > > > true: no existing software uses the vector extension now, so most likely a
> > > > > process will not be using it.
> > > >
> > > > Glibc ustreaming is just starting so you didn't see software using the
> > > > vector extension now and this patchset is testing based on those
> > > > optimized glibc too. Vincent Chen is working on the glibc vector
> > > > support upstreaming and we will also upstream the vector version glibc
> > > > memcpy, memcmp, memchr, memmove, memset, strcmp, strlen. Then we will
> > > > see platform with vector support can use vector version mem* and str*
> > > > functions automatically based on ifunc and platform without vector
> > > > will use the original one automatically. These could be done to select
> > > > the correct optimized glibc functions by ifunc mechanism.
> >
> > In your reply, I noticed that you didn't address Greentime's response
> > here.  But this looks like the key issue.  If common library functions are
> > vector-accelerated, wouldn't it make sense that almost every process would
> > wind up using vector instructions?  And thus there wouldn't be much point
> > to skipping the vector context memory allocation?
> >
>
> This issue was addressed in the thread regarding Intel AMX I linked to in a
> previous message.  I don't agree that this is the key issue; it is one of a
> number of issues.  What if I don't want to take the potential
> power/frequency hit for the vector unit for a workload that, at best, uses
> it for the occasional memcpy?  What if the allocation fails, how will that

Hi Darius,
The memcpy function seems not to be occasionally used in the programs
because many functions in Glibc use memcpy() to complete the memory
copy. I use the following simple case as an example.
test.c
void main(void) {
    return;
}
Then, we compile it by "gcc test.c -o a.out" and execute it. In the
execution, the memcpy() has been called unexpectedly. It is because
many libc initialized functions will be executed before entering the
user-defined main function. One of the example is __libc_setup_tls(),
which is called by __libc_start_main(). The __libc_setup_tls() will
use memcpy() during the process of creating the Dynamic Thread Vector
(DTV).

Therefore, I think the memcpy() is widely used in most programs.

> get reported to user space (hint: not well)?  According to Greentime,
> RISC-V vector is similar to ARM SVE, which allocates memory for context
> state on first use and not unconditionally for all processes.
>
> // darius
>



More information about the linux-riscv mailing list