Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness

Will Deacon will.deacon at
Mon Oct 17 11:38:07 PDT 2016

Hi all,

I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
believe that the new compiler behaviour at the heart of the problem
has the potential to affect other architectures and other pieces of
kernel code relying on dead-code elimination to remove deliberately
undefined functions.

The failure looks like:

  | drivers/built-in.o: In function `armada_3700_add_composite_clk':
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
  | undefined reference to `____ilog2_NaN'
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
  | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
  | `____ilog2_NaN'
  | make: *** [vmlinux] Error 1

and if we look at the source for armada_3700_add_composite_clk, we see
that this is caused by:

  int table_size = 0;

  rate->reg = reg + (u64)rate->reg;
  for (clkt = rate->table; clkt->div; clkt++)
  rate->width = order_base_2(table_size);

order_base_2 calls ilog2, which has the ____ilog2_NaN call:

#define ilog2(n)				\
(						\
	__builtin_constant_p(n) ? (		\
		(n) < 1 ? ____ilog2_NaN() :	\

This is because we're in a curious case where GCC has emitted a
special-cased version of armada_3700_add_composite_clk, with table_size
effectively constant-folded as 0. Whilst we shouldn't see this in a
non-buggy kernel (hence the deliberate call to the undefined function
____ilog2_NaN), it means that the final link fails because we have a
____ilog2_NaN in the code, with a runtime check on table_size.

In other words, __builtin_constant_p appears to be weaker than we've
been assuming. Talking to the compiler guys here, this is due to the
"jump-threading" optimisation pass, so the patch below disables that.

A simpler example is:

int foo();
int bar();

int count(int *argc)
	int table_size = 0;

	for (; *argc; argc++)

	if (__builtin_constant_p(table_size))
		return table_size == 0 ? foo() : bar();

	return bar();

which compiles to:

	ldr	w0, [x0]
	cbz	w0, .L4
	b	bar
	.p2align 3
	b	foo

and, with the "optimisation" disabled:

	b	bar

Thoughts? It feels awfully fragile disabling passes like this, but with
GCC transforming the code like this, I can't immediately think of a way
to preserve the intended behaviour of the code.



diff --git a/Makefile b/Makefile
index 512e47a53e9a..750873d6d11e 100644
--- a/Makefile
+++ b/Makefile
@@ -641,6 +641,11 @@ endif
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS	+= $(call cc-option,--param=allow-store-data-races=0)
+# Stop gcc from converting switches into a form that defeats dead code
+# elimination and can subsequently lead to calls to intentionally
+# undefined functions appearing in the final link.
+KBUILD_CFLAGS	+= $(call cc-option,--param=max-fsm-thread-path-insns=1)
 include scripts/Makefile.gcc-plugins

More information about the linux-arm-kernel mailing list