[PATCH v2] arm64/xor: use EOR3 instructions when available
Ard Biesheuvel
ardb at kernel.org
Tue Dec 14 04:57:47 PST 2021
On Tue, 14 Dec 2021 at 12:36, Catalin Marinas <catalin.marinas at arm.com> wrote:
>
> On Tue, Dec 14, 2021 at 12:05:34PM +0100, Ard Biesheuvel wrote:
> > On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb at kernel.org> wrote:
> > >
> > > + Arnd
> > >
> > > On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan at kernel.org> wrote:
> > > >
> > > > Hi Ard,
> > > >
> > > > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote:
> > > > > Use the EOR3 instruction to implement xor_blocks() if the instruction is
> > > > > available, which is the case if the CPU implements the SHA-3 extension.
> > > > > This is about 20% faster on Apple M1 when using the 5-way version.
> > > > >
> > > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > > >
> > > > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use
> > > > EOR3 instructions when available") in the arm64 tree breaks
> > > > allyesconfig:
> > > >
> > > > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true
> > > >
> > > > I also see this when building with GCC 11.2.0:
> > > >
> > > > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ...
> > > > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>?
> > > > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object
> > >
> > > I suspect this is another genksyms crash, preventing the
> > > __crc_xor_block_inner_neon symbol from ever being emitted.
> > >
> > > This is a recurring annoyance and I am not sure how to address this
> > > properly. Arnd might have some thoughts on the matter as well.
> >
> > I managed to reproduce this: it's not a crash but definitely a bug in
> > genksyms, as it simply fails to produce the output containing the
> > assignment of __crc_xor_block_inner_neon.
> >
> > Moving the definition of xor_block_inner_neon as below works around the issue.
> >
> > Catalin: would you like me to spin a v3? Or do your prefer to just
> > fold this into the existing one?
>
> I'll fold it in. Thanks.
>
The root cause appears to be that genksyms gives up when it encounters
static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r)
{
because the types are not defined. This is because our
asm/neon-intrinsics.h header avoids #include'ing arm-neon.h in the
context of genksyms, as doing so does result in a genksyms crash.
I have very little motivation to go and figure out why genksyms
crashes in that case, so I think for now, we can stick with the fix I
proposed. Alternatively, we could typedef uint64x2_t to something
arbitrary if __GENKSYMS__ is defined, or use a macro instead of a
static inline for eor3()
More information about the linux-arm-kernel
mailing list