Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness
Will Deacon
will.deacon at arm.com
Mon Oct 17 11:38:07 PDT 2016
Hi all,
I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
believe that the new compiler behaviour at the heart of the problem
has the potential to affect other architectures and other pieces of
kernel code relying on dead-code elimination to remove deliberately
undefined functions.
The failure looks like:
| drivers/built-in.o: In function `armada_3700_add_composite_clk':
|
| linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
| undefined reference to `____ilog2_NaN'
|
| linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
| relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
| `____ilog2_NaN'
|
| make: *** [vmlinux] Error 1
and if we look at the source for armada_3700_add_composite_clk, we see
that this is caused by:
int table_size = 0;
rate->reg = reg + (u64)rate->reg;
for (clkt = rate->table; clkt->div; clkt++)
table_size++;
rate->width = order_base_2(table_size);
order_base_2 calls ilog2, which has the ____ilog2_NaN call:
#define ilog2(n) \
( \
__builtin_constant_p(n) ? ( \
(n) < 1 ? ____ilog2_NaN() : \
This is because we're in a curious case where GCC has emitted a
special-cased version of armada_3700_add_composite_clk, with table_size
effectively constant-folded as 0. Whilst we shouldn't see this in a
non-buggy kernel (hence the deliberate call to the undefined function
____ilog2_NaN), it means that the final link fails because we have a
____ilog2_NaN in the code, with a runtime check on table_size.
In other words, __builtin_constant_p appears to be weaker than we've
been assuming. Talking to the compiler guys here, this is due to the
"jump-threading" optimisation pass, so the patch below disables that.
A simpler example is:
int foo();
int bar();
int count(int *argc)
{
int table_size = 0;
for (; *argc; argc++)
table_size++;
if (__builtin_constant_p(table_size))
return table_size == 0 ? foo() : bar();
return bar();
}
which compiles to:
count:
ldr w0, [x0]
cbz w0, .L4
b bar
.p2align 3
.L4:
b foo
and, with the "optimisation" disabled:
count:
b bar
Thoughts? It feels awfully fragile disabling passes like this, but with
GCC transforming the code like this, I can't immediately think of a way
to preserve the intended behaviour of the code.
Will
--->8
diff --git a/Makefile b/Makefile
index 512e47a53e9a..750873d6d11e 100644
--- a/Makefile
+++ b/Makefile
@@ -641,6 +641,11 @@ endif
# Tell gcc to never replace conditional load with a non-conditional one
KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)
+# Stop gcc from converting switches into a form that defeats dead code
+# elimination and can subsequently lead to calls to intentionally
+# undefined functions appearing in the final link.
+KBUILD_CFLAGS += $(call cc-option,--param=max-fsm-thread-path-insns=1)
+
include scripts/Makefile.gcc-plugins
ifdef CONFIG_READABLE_ASM
More information about the linux-arm-kernel
mailing list