[PATCH] arm64: jump_label: use constraint "S" instead of "i"

Thu Feb 1 12:56:14 PST 2024

On 2024-02-01, Dave Martin wrote:
>On Thu, Feb 01, 2024 at 05:07:59PM +0100, Ard Biesheuvel wrote:
>> On Thu, 1 Feb 2024 at 12:50, Dave Martin <Dave.Martin at arm.com> wrote:
>> >
>> > On Thu, Feb 01, 2024 at 01:11:20AM -0800, Fangrui Song wrote:
>> ...
>> > >
>> > > Hmm. Clang's constraint "S" implementation doesn't support a constant
>> > > offset, so
>> > > `static_key_false(&nf_hooks_needed[pf][hook])` in include/linux/netfilter.h:nf_hook
>> > > would not compile with Clang <= 18.
>> > >
>> > > I have a patch https://github.com/llvm/llvm-project/pull/80255 , but
>> > > even if it is accepted and cherry-picked into the 18.x release branch,
>> > > if we still support older Clang, we cannot use "S" unconditionally.
>> > >
>> > >
>> > > So we probably need the following to prepare for -fPIE support in the
>> > > future:
>> > >
>> > > diff --git a/arch/arm64/include/asm/jump_label.h b/arch/arm64/include/asm/jump_label.h
>> > > index 48ddc0f45d22..b8af2f8b0c99 100644
>> > > --- a/arch/arm64/include/asm/jump_label.h
>> > > +++ b/arch/arm64/include/asm/jump_label.h
>> > > @@ -15,6 +15,16 @@
>> > >  #define JUMP_LABEL_NOP_SIZE          AARCH64_INSN_SIZE
>> > > +/*
>> > > + * Prefer "S" to support PIC. However, use "i" for Clang 18 and earlier as "S"
>> > > + * on a symbol with a constant offset is not supported.
>> > > + */
>> > > +#if defined(CONFIG_CC_IS_CLANG) && __clang_major__ <= 18
>> >
>> > Is a GCC version check needed?  Or is the minimum GCC version specified
>> > for building the kernel new enough?
>> >
>> > > +#define JUMP_LABEL_STATIC_KEY_CONSTRAINT "i"
>> > > +#else
>> > > +#define JUMP_LABEL_STATIC_KEY_CONSTRAINT "S"
>> > > +#endif
>> > > +
>>
>> Can we use "Si" instead?
>
>I thought the point was to avoid "S" on compilers that would choke on
>it?  If so, those compilers would surely choke on "Si" too, no?

"Si" is an invalid constraint. Unfortunately, GCC recognizes "S" as a
valid constraint and ignores trailing characters (asm_operand_ok). That
is, GCC would accept "Siiiii" as well...

The GCC support for "S" is great. The initial aarch64 port in 2012 supports "S".

>> > >  static __always_inline bool arch_static_branch(struct static_key * const key,
>> > >                                              const bool branch)
>> > >  {
>> > > @@ -23,9 +33,9 @@ static __always_inline bool arch_static_branch(struct static_key * const key,
>> > >                "      .pushsection    __jump_table, \"aw\"    \n\t"
>> > >                "      .align          3                       \n\t"
>> > >                "      .long           1b - ., %l[l_yes] - .   \n\t"
>> > > -              "      .quad           %c0 - .                 \n\t"
>> > > +              "      .quad           %0 + %1 - .             \n\t"
>> > >                "      .popsection                             \n\t"
>> > > -              :  :  "i"(&((char *)key)[branch]) :  : l_yes);
>> > > +              :  :  JUMP_LABEL_STATIC_KEY_CONSTRAINT(key), "i"(branch) :  : l_yes);
>> >
>> > While this looks reasonable, I'm still not clear now why the asm must do
>> > the addition.
>> >
>> > Can we roll the branch argument into the macro?
>> >
>> > "S"(symbol + constant) seems to do the right thing for AArch64, at least
>> > in GCC.
>> >
>>
>> 'symbol' is a struct static_key pointer, so adding '1' to it will not
>> produce the needed result.
>
>I mean loosely things that resolve to something of the form
>"symbol + constant" in the compiler output.
>
>"S"(&((char *)key)[branch]) does indeed seem to do the right thing,
>at least with GCC.
>
>I probably didn't help by bikeshedding the way that expression was
>written, apologies for that.  It's orthogonal to what this patch is
>doing.

Yes, "S"(&((char *)key)[branch])  would do the right thing.
I have compared assembly output. It's a matter of "s" vs "s + 0" and "s+1" vs "s + 1".

>>
>> ...
>> > > > > >If there's another reason why "S" is advantageous though, I'm happy to
>> > > > > >be corrected.
>> > > > >
>> > > > > I remember that Ard has an RFC
>> > > > > https://lore.kernel.org/linux-arm-kernel/20220427171241.2426592-1-ardb@kernel.org/
>> > > > > "[RFC PATCH 0/2] arm64: use PIE code generation for KASLR kernel"
>> > > > > and see some recent PIE codegen patches.
>> > > > >
>> > > > > > Building the KASLR kernel without -fpie but linking it with -pie works
>> > > > > > in practice, but it is not something that is explicitly supported by the
>> > > > > > toolchains - it happens to work because the default 'small' code model
>> > > > > > used by both GCC and Clang relies mostly on ADRP+ADD/LDR to generate
>> > > > > > symbol references.
>> >
>> > Does this mean that doing boot-time relocation on a fully statically
>> > linked kernel doesn't bring much benefit?
>>
>> Not sure I follow you here. The boot-time relocation is necessary in
>> any case, to fix up statically initialized pointer variables all over
>> the kernel.
>>
>> > It would tend to be more
>> > painful to do the relocation work, and mean that the kernel text has to
>> > be temporarily writeable, but static linking should have the best
>> > runtime performance.
>> >
>>
>> Not sure what you mean by 'static linking' here.
>>
>> The only thing PIE linking does in the case of the kernel is
>> a) complain if static relocations are used that cannot be fixed up at
>> runtime (e.g., movk/movz sequences)
>> b) emit dynamic relocations that the early startup code can use to fix
>> up all statically initialized pointers
>>
>> From a codegen point-of-view, there is no difference because we don't
>> use -fpic code. The problem with -fpic codegen is that it makes some
>> assumptions that only hold for shared ELF objects, e.g., avoiding
>> dynamic relocations in .text, using GOT entries even for variables
>> defined in the same compilation so that they can be preempted, keeping
>> all runtime relocatable quantities together so the CoW footprint of
>> the shared library is minimized.
>>
>> None of this matters for the kernel, and as it turns out, the non-PIC
>> small C model produces code that the PIE linker is happy with.
>>
>> TL;DR our code (including inline and out-of-line asm) is already PIC,
>> and this property is relied upon by KASLR.
>>
>> > Maybe it doesn't make as much difference as I thought (certainly ADRP
>> > based addressing should make a fair amount of the PIC/PIE overhead go
>> > away).
>> >
>>
>> The PIC/PIE overhead is in the indirections via the GOT. However,
>> recent linkers will actually perform some relaxations too: if a symbol
>> is defined in the same executable and is not preemptible, a GOT load
>>
>> ADRP
>> LDR
>>
>> can be transparently converted to
>>
>> ADRP
>> ADD
>>
>> referring to the symbol directly. This is also where hidden.h comes
>> in: making all symbol declarations and definitions 'hidden' will allow
>> the compiler to assume that a GOT entry is never needed.
>
>Is there a reason why we don't just build the whole kernel with
>-fvisibility=hidden today?

This topic, loosely related to this patch, is about switching to PIC
compiles. I am not familiar with the Linux kernel, so I'll mostly leave
the discussion to you and Ard :)

That said, I have done a lot of Clang work in visibility/symbol
preemptibility and linkers, so I am happy to add what I know.

-fvisibility=hidden only applies to definitions, not non-definition
declarations.

I've mentioned this at
https://lore.kernel.org/all/20220429070318.iwj3j5lpfkw4t7g2@google.com/

     `#pragma GCC visibility push(hidden)` is very similar to -fvisibility=hidden -fdirect-access-external-data with Clang.
     ...
     The kernel uses neither TLS nor -fno-plt, so -fvisibility=hidden
     -fdirect-access-external-data can replace `#pragma GCC visibility
     push(hidden)`.

>> So building with -fPIC is currently not need in practice, and creates
>> some complications, which is why we have been avoiding it. But PIE
>> linking non-PIC objects is not guaranteed to remain supported going
>> forward, so it would be better to have a plan B, i.e., being able to
>> turn on -fpic without massive regressions due to GOT overhead, or
>> codegen problems with our asm hacks.
>
>Summarising all of this is it right that:
>
>1) ld -pie is how we get the reloc info into the kernel for KASLR
>today.
>
>2) We use gcc -fno-pic today, but this might break with ld -pie in the
>future, although it works for now.
>
>3) gcc -fno-pic and gcc -fPIC (or -fPIE) generate almost the same code,
>assuming we tweak symbol visibility and use a memory model that
>ADR+ADD/LDR can span.  So, moving to -fPIE is likely to be do-able.
>
>
>My point is that an alternative option would be to move to ld -no-pie.
>We would need another way to get relocs into the kernel, such as an
>intermediate step with ld --emit-relocs.  I have definitely seen this
>done before somewhere, but it would be extra work and not necessarily
>worth it, based on what you say about code gen.
>
>This may all have been discussed to death already -- I didn't mean to
>hijack this patch, so I'll stop digging here, but if I've misunderstood,
>please shout!
>
>Cheers
>---Dave