[PATCH v2] raid6: arm64: add SVE optimized implementation for syndrome generation

Sat Apr 25 04:46:31 PDT 2026

Hi all!

In the future I will try to look at optimisation of current algorithms
instead of SVE realisation, looks like there are few interesting
cases, I will share details in future changes.

Thanks all for the discussion!

пт, 17 квіт. 2026 р. о 18:36 Mark Brown <broonie at kernel.org> пише:
>
> On Fri, Apr 17, 2026 at 04:43:06PM +0200, Ard Biesheuvel wrote:
>
> > On arm64, kernel mode NEON is mostly used to gain access to AES and SHA
> > instructions, and only to a lesser degree to speed up ordinary
> > arithmetic, and so XOR is somewhat of an outlier here.
>
> > Given that Neoverse V1 apparently already carves up ordinary arithmetic
> > performed on 256-bit vectors and operates on 128 bits at a time, I am
> > rather skeptical that we're likely to see any SVE implementations of the
> > crypto extensions soon that are meaningfully faster, given that these
> > are presumably much costlier to implement in terms of gate count, and
> > therefore likely to be split up even on SVE implementations that can
> > perform ordinary arithmetic on 256+ bit vectors in a single cycle. Note
> > that even the arm64 SIMD accelerated CRC implementations rely heavily on
> > 64x64->128 polynomial multiplication.
>
> I'd not be surprised to see something that delivers useful benefits
> using SVE at some point.
>
> > IOW, before we consider kernel mode SVE, I'd like to see some benchmarks
> > for other algorithms too.
>
> Definitely, it needs a solid win to merge anything.  I do want to get
> back to the situation where we've got out of tree infrastructure patches
> so that people working on algorithms have something to base their work
> on (and see the overheads using SVE incurs) but unless theres's a
> practical user they should stay out of tree.