[PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

Catalin Marinas catalin.marinas at arm.com
Fri Mar 19 12:25:08 GMT 2021


On Thu, Mar 18, 2021 at 09:41:54AM +0100, Arnd Bergmann wrote:
> On Wed, Mar 17, 2021 at 5:18 PM Catalin Marinas <catalin.marinas at arm.com> wrote:
> >
> > On Wed, Mar 17, 2021 at 02:37:57PM +0000, Catalin Marinas wrote:
> > > On Thu, Feb 25, 2021 at 12:20:56PM +0100, Arnd Bergmann wrote:
> > > > diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
> > > > index bad2b9eaab22..926cdb597a45 100644
> > > > --- a/arch/arm64/kernel/vmlinux.lds.S
> > > > +++ b/arch/arm64/kernel/vmlinux.lds.S
> > > > @@ -217,7 +217,7 @@ SECTIONS
> > > >             INIT_CALLS
> > > >             CON_INITCALL
> > > >             INIT_RAM_FS
> > > > -           *(.init.altinstructions .init.bss .init.bss.*)  /* from the EFI stub */
> > > > +           *(.init.altinstructions .init.data.* .init.bss .init.bss.*)     /* from the EFI stub */
> > >
> > > INIT_DATA already covers .init.data and .init.data.*, so I don't think
> > > we need this change.
> >
> > Ah, INIT_DATA only covers init.data.* (so no dot in front). The above
> > is needed for the EFI stub.
> 
> I wonder if that is just a typo in INIT_DATA. Nico introduced it as part of
> 266ff2a8f51f ("kbuild: Fix asm-generic/vmlinux.lds.h for
> LD_DEAD_CODE_DATA_ELIMINATION"), so perhaps that should have
> been .init.data.* instead.

I think it was the other Nicholas ;) (with an 'h'). The vmlinux.lds.h
change indeed looks like a typo (it's been around since 4.18).

> > However, I gave this a quick try and under Qemu with -cpu max and -smp 2
> > (or more) it fails as below. I haven't debugged but the lr points to
> > just after the switch_to() call. Maybe some section got discarded and we
> > patched in the wrong instructions. It is fine with -cpu host or -smp 1.
> 
> Ah, interesting.
> 
> > -------------------8<------------------------
> > smp: Bringing up secondary CPUs ...
> > Detected PIPT I-cache on CPU1
> > CPU1: Booted secondary processor 0x0000000001 [0x000f0510]
> > Unable to handle kernel paging request at virtual address eb91d81ad2971160
> > Mem abort info:
> >   ESR = 0x86000004
> >   EC = 0x21: IABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> > [eb91d81ad2971160] address between user and kernel address ranges
> > Internal error: Oops: 86000004 [#1] PREEMPT SMP
> > Modules linked in:
> > CPU: 1 PID: 16 Comm: migration/1 Not tainted 5.12.0-rc3-00002-g128e977c1322 #1
> > Stopper: 0x0 <- 0x0
> > pstate: 60000085 (nZCv daIf -PAN -UAO -TCO BTYPE=--)
> > pc : 0xeb91d81ad2971160
> > lr : __schedule+0x230/0x6b8
> > sp : ffff80001009bd60
> > x29: ffff80001009bd60 x28: 0000000000000000
> > x27: ffff0000000a6760 x26: ffff0000000b7540
> > x25: 0080000000000000 x24: ffffd81ad3969000
> > x23: ffff0000000a6200 x22: 6ee0d81ad2971658
> > x21: ffff0000000a6200 x20: ffff000000080000
> > x19: ffff00007fbc6bc0 x18: 0000000000000030
> > x17: 0000000000000000 x16: 0000000000000000
> > x15: 00008952b30a9a9e x14: 0000000000000366
> > x13: 0000000000000192 x12: 0000000000000000
> > x11: 0000000000000003 x10: 00000000000009b0
> > x9 : ffff80001009bd30 x8 : ffff0000000a6c10
> > x7 : ffff00007fbc6cc0 x6 : 00000000fffedb30
> > x5 : 00000000ffffffff x4 : 0000000000000000
> > x3 : 0000000000000008 x2 : 0000000000000000
> > x1 : ffff0000000a6200 x0 : ffff0000000a3800
> > Call trace:
> >  0xeb91d81ad2971160
> >  schedule+0x70/0x108
> >  schedule_preempt_disabled+0x24/0x40
> >  __kthread_parkme+0x68/0xd0
> >  kthread+0x138/0x170
> >  ret_from_fork+0x10/0x30
> > Code: bad PC value
> > ---[ end trace af3481062ecef3e7 ]---
> 
> This looks like it has just returned from __schedule() to schedule()
> and is trying to return from that as well, through code like this:
> 
> .L562:
> // /git/arm-soc/kernel/sched/core.c:5159: }
>         ldp     x19, x20, [sp, 16]      //,,
>         ldp     x29, x30, [sp], 32      //,,,
>         hint    29 // autiasp
>         ret
> 
> It looks like pointer authentication gone wrong, which ended up
> with dereferencing the broken pointer in x22, and it explains why
> it only happens with -cpu max. Presumably this also only happens
> on secondary CPUs, so maybe the bit that initializes PAC on
> secondary CPUs got discarded?

I seems that the whole alternative instructions section is gone, so any
run-time code patching that the kernel does won't work. The kernel boots
with the diff below but I'm not convinced we don't miss anything else.
In some cases you get a linker warning about gc sections but not in this
case. Maybe we need some more asserts to ensure that certain sections
are not empty.

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 11909782ee3e..036cc59033d3 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -203,7 +203,7 @@ SECTIONS
 	. = ALIGN(4);
 	.altinstructions : {
 		__alt_instructions = .;
-		*(.altinstructions)
+		KEEP(*(.altinstructions))
 		__alt_instructions_end = .;
 	}

Do we need a KEEP(.init.altinstructions) as well? 

BTW, the build fails with CONFIG_FUNCTION_TRACER enabled:

aarch64-linux-gnu-ld: init/main.o(__patchable_function_entries): error: need linked-to section for --gc-sections

-- 
Catalin



More information about the linux-arm-kernel mailing list