[PATCH v2] arm64/xor: use EOR3 instructions when available

Ard Biesheuvel ardb at kernel.org
Tue Dec 14 03:05:34 PST 2021


On Tue, 14 Dec 2021 at 09:19, Ard Biesheuvel <ardb at kernel.org> wrote:
>
> + Arnd
>
> On Tue, 14 Dec 2021 at 03:37, Nathan Chancellor <nathan at kernel.org> wrote:
> >
> > Hi Ard,
> >
> > On Mon, Dec 13, 2021 at 03:02:52PM +0100, Ard Biesheuvel wrote:
> > > Use the EOR3 instruction to implement xor_blocks() if the instruction is
> > > available, which is the case if the CPU implements the SHA-3 extension.
> > > This is about 20% faster on Apple M1 when using the 5-way version.
> > >
> > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> >
> > Our CI reported that this patch as commit ce9ba49a2460 ("arm64/xor: use
> > EOR3 instructions when available") in the arm64 tree breaks
> > allyesconfig:
> >
> > https://github.com/ClangBuiltLinux/continuous-integration2/runs/4514540083?check_suite_focus=true
> >
> > I also see this when building with GCC 11.2.0:
> >
> > WARNING: modpost: EXPORT symbol "xor_block_inner_neon" [vmlinux] version ...
> > Is "xor_block_inner_neon" prototyped in <asm/asm-prototypes.h>?
> > aarch64-linux-gnu-ld: arch/arm64/lib/xor-neon.o: relocation R_AARCH64_ABS32 against `__crc_xor_block_inner_neon' can not be used when making a shared object
>
> I suspect this is another genksyms crash, preventing the
> __crc_xor_block_inner_neon symbol from ever being emitted.
>
> This is a recurring annoyance and I am not sure how to address this
> properly. Arnd might have some thoughts on the matter as well.
>
>

I managed to reproduce this: it's not a crash but definitely a bug in
genksyms, as it simply fails to produce the output containing the
assignment of __crc_xor_block_inner_neon.

Moving the definition of xor_block_inner_neon as below works around the issue.

Catalin: would you like me to spin a v3? Or do your prefer to just
fold this into the existing one?

diff --git a/arch/arm64/lib/xor-neon.c b/arch/arm64/lib/xor-neon.c
index 5c8688700f63..d189cf4e70ea 100644
--- a/arch/arm64/lib/xor-neon.c
+++ b/arch/arm64/lib/xor-neon.c
@@ -167,6 +167,15 @@ void xor_arm64_neon_5(unsigned long bytes,
unsigned long *p1,
        } while (--lines > 0);
 }

+struct xor_block_template xor_block_inner_neon __ro_after_init = {
+       .name   = "__inner_neon__",
+       .do_2   = xor_arm64_neon_2,
+       .do_3   = xor_arm64_neon_3,
+       .do_4   = xor_arm64_neon_4,
+       .do_5   = xor_arm64_neon_5,
+};
+EXPORT_SYMBOL(xor_block_inner_neon);
+
 static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r)
 {
        uint64x2_t res;
@@ -296,15 +305,6 @@ static void xor_arm64_eor3_5(unsigned long bytes,
unsigned long *p1,
        } while (--lines > 0);
 }

-struct xor_block_template xor_block_inner_neon __ro_after_init = {
-       .name   = "__inner_neon__",
-       .do_2   = xor_arm64_neon_2,
-       .do_3   = xor_arm64_neon_3,
-       .do_4   = xor_arm64_neon_4,
-       .do_5   = xor_arm64_neon_5,
-};
-EXPORT_SYMBOL(xor_block_inner_neon);
-
 static int __init xor_neon_init(void)
 {
        if (IS_ENABLED(CONFIG_AS_HAS_SHA3) && cpu_have_named_feature(SHA3)) {



More information about the linux-arm-kernel mailing list