[PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations

Tue May 13 04:39:14 PDT 2025

Hi Chunyan,

On 08/05/2025 09:14, Chunyan Zhang wrote:
> Hi Palmer,
>
> On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer at dabbelt.com> wrote:
>> On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan at iscas.ac.cn wrote:
>>> The assembly is originally based on the ARM NEON and int.uc, but uses
>>> RISC-V vector instructions to implement the RAID6 syndrome and
>>> recovery calculations.
>>>
>>> The functions are tested on QEMU running with the option "-icount shift=0":
>> Does anyone have hardware benchmarks for this?  There's a lot more code
>> here than the other targets have.  If all that unrolling is necessary for
>> performance on real hardware then it seems fine to me, but just having
>> it for QEMU doesn't really tell us much.
> I made tests on Banana Pi BPI-F3 and Canaan K230.
>
> BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
> result on BPI-F3 was:
>
>    raid6: rvvx1    gen()  2916 MB/s
>    raid6: rvvx2    gen()  2986 MB/s
>    raid6: rvvx4    gen()  2975 MB/s
>    raid6: rvvx8    gen()  2763 MB/s
>    raid6: int64x8  gen()  1571 MB/s
>    raid6: int64x4  gen()  1741 MB/s
>    raid6: int64x2  gen()  1639 MB/s
>    raid6: int64x1  gen()  1394 MB/s
>    raid6: using algorithm rvvx2 gen() 2986 MB/s
>    raid6: .... xor() 2 MB/s, rmw enabled
>    raid6: using rvv recovery algorithm
>
> The K230 uses the XuanTie C908 dual-core processor, with the larger
> core C908 featuring the RVV1.0 extension, the test result on K230 was:
>
>    raid6: rvvx1    gen()  1556 MB/s
>    raid6: rvvx2    gen()  1576 MB/s
>    raid6: rvvx4    gen()  1590 MB/s
>    raid6: rvvx8    gen()  1491 MB/s
>    raid6: int64x8  gen()  1142 MB/s
>    raid6: int64x4  gen()  1628 MB/s
>    raid6: int64x2  gen()  1651 MB/s
>    raid6: int64x1  gen()  1391 MB/s
>    raid6: using algorithm int64x2 gen() 1651 MB/s
>    raid6: .... xor() 879 MB/s, rmw enabled
>    raid6: using rvv recovery algorithm
>
> We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
> rvvx4 on K230 compared with other rvv algorithms.
>
> I have only these two RVV boards for now, so no more testing data on
> more different systems, I'm not sure if rvv8 will be needed on some
> hardware or some other system environments.

Can we have a comparison before and after the use of your patch?

In addition, how do you check the correctness of your implementation?

I'll add whatever numbers you provide to the commit log and merge your 
patch for 6.16.

Thanks a lot,

Alex

>
> Thanks,
> Chunyan
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv