[PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations
Alexandre Ghiti
alex at ghiti.fr
Wed May 21 02:00:44 PDT 2025
On 5/13/25 13:39, Alexandre Ghiti wrote:
> Hi Chunyan,
>
> On 08/05/2025 09:14, Chunyan Zhang wrote:
>> Hi Palmer,
>>
>> On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer at dabbelt.com> wrote:
>>> On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan at iscas.ac.cn
>>> wrote:
>>>> The assembly is originally based on the ARM NEON and int.uc, but uses
>>>> RISC-V vector instructions to implement the RAID6 syndrome and
>>>> recovery calculations.
>>>>
>>>> The functions are tested on QEMU running with the option "-icount
>>>> shift=0":
>>> Does anyone have hardware benchmarks for this? There's a lot more code
>>> here than the other targets have. If all that unrolling is
>>> necessary for
>>> performance on real hardware then it seems fine to me, but just having
>>> it for QEMU doesn't really tell us much.
>> I made tests on Banana Pi BPI-F3 and Canaan K230.
>>
>> BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
>> result on BPI-F3 was:
>>
>> raid6: rvvx1 gen() 2916 MB/s
>> raid6: rvvx2 gen() 2986 MB/s
>> raid6: rvvx4 gen() 2975 MB/s
>> raid6: rvvx8 gen() 2763 MB/s
>> raid6: int64x8 gen() 1571 MB/s
>> raid6: int64x4 gen() 1741 MB/s
>> raid6: int64x2 gen() 1639 MB/s
>> raid6: int64x1 gen() 1394 MB/s
>> raid6: using algorithm rvvx2 gen() 2986 MB/s
>> raid6: .... xor() 2 MB/s, rmw enabled
>> raid6: using rvv recovery algorithm
So I'm playing with my new BananaPi and I got the following numbers:
[ 0.628134] raid6: int64x8 gen() 1074 MB/s
[ 0.696263] raid6: int64x4 gen() 1574 MB/s
[ 0.764383] raid6: int64x2 gen() 1677 MB/s
[ 0.832504] raid6: int64x1 gen() 1387 MB/s
[ 0.833824] raid6: using algorithm int64x2 gen() 1677 MB/s
[ 0.907378] raid6: .... xor() 829 MB/s, rmw enabled
[ 0.909301] raid6: using intx1 recovery algorithm
So I realize that you provided the numbers I asked for...Sorry about
that. That's a very nice improvement, well done.
I'll add your patch as-is for 6.16.
Thanks again,
Alex
>>
>> The K230 uses the XuanTie C908 dual-core processor, with the larger
>> core C908 featuring the RVV1.0 extension, the test result on K230 was:
>>
>> raid6: rvvx1 gen() 1556 MB/s
>> raid6: rvvx2 gen() 1576 MB/s
>> raid6: rvvx4 gen() 1590 MB/s
>> raid6: rvvx8 gen() 1491 MB/s
>> raid6: int64x8 gen() 1142 MB/s
>> raid6: int64x4 gen() 1628 MB/s
>> raid6: int64x2 gen() 1651 MB/s
>> raid6: int64x1 gen() 1391 MB/s
>> raid6: using algorithm int64x2 gen() 1651 MB/s
>> raid6: .... xor() 879 MB/s, rmw enabled
>> raid6: using rvv recovery algorithm
>>
>> We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
>> rvvx4 on K230 compared with other rvv algorithms.
>>
>> I have only these two RVV boards for now, so no more testing data on
>> more different systems, I'm not sure if rvv8 will be needed on some
>> hardware or some other system environments.
>
>
> Can we have a comparison before and after the use of your patch?
>
> In addition, how do you check the correctness of your implementation?
>
> I'll add whatever numbers you provide to the commit log and merge your
> patch for 6.16.
>
> Thanks a lot,
>
> Alex
>
>
>>
>> Thanks,
>> Chunyan
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the linux-riscv
mailing list