[PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations

Wed May 21 02:00:44 PDT 2025

On 5/13/25 13:39, Alexandre Ghiti wrote:
> Hi Chunyan,
>
> On 08/05/2025 09:14, Chunyan Zhang wrote:
>> Hi Palmer,
>>
>> On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer at dabbelt.com> wrote:
>>> On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan at iscas.ac.cn 
>>> wrote:
>>>> The assembly is originally based on the ARM NEON and int.uc, but uses
>>>> RISC-V vector instructions to implement the RAID6 syndrome and
>>>> recovery calculations.
>>>>
>>>> The functions are tested on QEMU running with the option "-icount 
>>>> shift=0":
>>> Does anyone have hardware benchmarks for this?  There's a lot more code
>>> here than the other targets have.  If all that unrolling is 
>>> necessary for
>>> performance on real hardware then it seems fine to me, but just having
>>> it for QEMU doesn't really tell us much.
>> I made tests on Banana Pi BPI-F3 and Canaan K230.
>>
>> BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
>> result on BPI-F3 was:
>>
>>    raid6: rvvx1    gen()  2916 MB/s
>>    raid6: rvvx2    gen()  2986 MB/s
>>    raid6: rvvx4    gen()  2975 MB/s
>>    raid6: rvvx8    gen()  2763 MB/s
>>    raid6: int64x8  gen()  1571 MB/s
>>    raid6: int64x4  gen()  1741 MB/s
>>    raid6: int64x2  gen()  1639 MB/s
>>    raid6: int64x1  gen()  1394 MB/s
>>    raid6: using algorithm rvvx2 gen() 2986 MB/s
>>    raid6: .... xor() 2 MB/s, rmw enabled
>>    raid6: using rvv recovery algorithm

So I'm playing with my new BananaPi and I got the following numbers:

[    0.628134] raid6: int64x8  gen()  1074 MB/s
[    0.696263] raid6: int64x4  gen()  1574 MB/s
[    0.764383] raid6: int64x2  gen()  1677 MB/s
[    0.832504] raid6: int64x1  gen()  1387 MB/s
[    0.833824] raid6: using algorithm int64x2 gen() 1677 MB/s
[    0.907378] raid6: .... xor() 829 MB/s, rmw enabled
[    0.909301] raid6: using intx1 recovery algorithm

So I realize that you provided the numbers I asked for...Sorry about 
that. That's a very nice improvement, well done.

I'll add your patch as-is for 6.16.

Thanks again,

Alex

>>
>> The K230 uses the XuanTie C908 dual-core processor, with the larger
>> core C908 featuring the RVV1.0 extension, the test result on K230 was:
>>
>>    raid6: rvvx1    gen()  1556 MB/s
>>    raid6: rvvx2    gen()  1576 MB/s
>>    raid6: rvvx4    gen()  1590 MB/s
>>    raid6: rvvx8    gen()  1491 MB/s
>>    raid6: int64x8  gen()  1142 MB/s
>>    raid6: int64x4  gen()  1628 MB/s
>>    raid6: int64x2  gen()  1651 MB/s
>>    raid6: int64x1  gen()  1391 MB/s
>>    raid6: using algorithm int64x2 gen() 1651 MB/s
>>    raid6: .... xor() 879 MB/s, rmw enabled
>>    raid6: using rvv recovery algorithm
>>
>> We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
>> rvvx4 on K230 compared with other rvv algorithms.
>>
>> I have only these two RVV boards for now, so no more testing data on
>> more different systems, I'm not sure if rvv8 will be needed on some
>> hardware or some other system environments.
>
>
> Can we have a comparison before and after the use of your patch?
>
> In addition, how do you check the correctness of your implementation?
>
> I'll add whatever numbers you provide to the commit log and merge your 
> patch for 6.16.
>
> Thanks a lot,
>
> Alex
>
>
>>
>> Thanks,
>> Chunyan
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv