[PATCH v5sub2 1/8] arm64: add support for module PLTs

Thu Feb 25 08:07:15 PST 2016

Hi Ard,

On Mon, Feb 01, 2016 at 02:09:31PM +0100, Ard Biesheuvel wrote:
> This adds support for emitting PLTs at module load time for relative
> branches that are out of range. This is a prerequisite for KASLR, which
> may place the kernel and the modules anywhere in the vmalloc area,
> making it more likely that branch target offsets exceed the maximum
> range of +/- 128 MB.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
> ---
> 
> In this version, I removed the distinction between relocations against
> .init executable sections and ordinary executable sections. The reason
> is that it is hardly worth the trouble, given that .init.text usually
> does not contain that many far branches, and this version now only
> reserves PLT entry space for jump and call relocations against undefined
> symbols (since symbols defined in the same module can be assumed to be
> within +/- 128 MB)
> 
> For example, the mac80211.ko module (which is fairly sizable at ~400 KB)
> built with -mcmodel=large gives the following relocation counts:
> 
>                     relocs    branches   unique     !local
>   .text              3925       3347       518        219
>   .init.text           11          8         7          1
>   .exit.text            4          4         4          1
>   .text.unlikely       81         67        36         17
> 
> ('unique' means branches to unique type/symbol/addend combos, of which
> !local is the subset referring to undefined symbols)
> 
> IOW, we are only emitting a single PLT entry for the .init sections, and
> we are better off just adding it to the core PLT section instead.
> ---
>  arch/arm64/Kconfig              |   9 +
>  arch/arm64/Makefile             |   6 +-
>  arch/arm64/include/asm/module.h |  11 ++
>  arch/arm64/kernel/Makefile      |   1 +
>  arch/arm64/kernel/module-plts.c | 201 ++++++++++++++++++++
>  arch/arm64/kernel/module.c      |  12 ++
>  arch/arm64/kernel/module.lds    |   3 +
>  7 files changed, 242 insertions(+), 1 deletion(-)

[...]

> +struct plt_entry {
> +	/*
> +	 * A program that conforms to the AArch64 Procedure Call Standard
> +	 * (AAPCS64) must assume that a veneer that alters IP0 (x16) and/or
> +	 * IP1 (x17) may be inserted at any branch instruction that is
> +	 * exposed to a relocation that supports long branches. Since that
> +	 * is exactly what we are dealing with here, we are free to use x16
> +	 * as a scratch register in the PLT veneers.
> +	 */
> +	__le32	mov0;	/* movn	x16, #0x....			*/
> +	__le32	mov1;	/* movk	x16, #0x...., lsl #16		*/
> +	__le32	mov2;	/* movk	x16, #0x...., lsl #32		*/
> +	__le32	br;	/* br	x16				*/
> +};

I'm worried about this code when CONFIG_ARM64_LSE_ATOMICS=y, but we don't
detect them on the CPU at runtime. In this case, all atomic operations
are moved out-of-line and called using a bl instruction from inline asm.

The out-of-line code is compiled with magic GCC options to force the
explicit save/restore of all used registers (see arch/arm64/lib/Makefile),
otherwise we'd have to clutter the inline asm with constraints that
wouldn't be needed had we managed to patch the bl with an LSE atomic
instruction.

If you're emitting a PLT, couldn't we end up with silent corruption of
x16 for modules using out-of-line atomics like this?

Will