? FAIL (91/181 SKIPPED): Test report for for-kernelci (6.2.0-rc5, arm-next, 2e84eedb)

Ard Biesheuvel ardb at kernel.org
Thu Jan 26 10:33:22 PST 2023


On Thu, 26 Jan 2023 at 18:25, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Thu, Jan 26, 2023 at 03:17:16PM +0000, Will Deacon wrote:
> > On Thu, Jan 26, 2023 at 03:09:58PM +0000, Mark Rutland wrote:
> > > On Thu, Jan 26, 2023 at 12:52:03PM +0000, Will Deacon wrote:
> > > > [+Mark and Ard in case they have ideas]
> > > >
> > > > On Wed, Jan 25, 2023 at 09:21:19AM -0000, cki-project at redhat.com wrote:
> > > > > Hi, we tested your kernel and here are the results:
> > > > >
> > > > >     Overall result: FAILED
> > > > >              Merge: OK
> > > > >            Compile: OK
> > > > >               Test: FAILED
> > > > >
> > > > >
> > > > > Kernel information:
> > > > >     Commit message: Merge branch 'for-next/core' into for-kernelci
> > > > >
> > > > > You can find all the details about the test run at
> > > > >     https://datawarehouse.cki-project.org/kcidb/checkouts/66828
> > > > >
> > > > > One or more kernel tests failed:
> > > > >     Unrecognized or new issues:
> > > > >          aarch64 - kdump - kexec_boot
> > > > >                    Logs: https://datawarehouse.cki-project.org/kcidb/tests/6799495
> > > >
> > > > This looks like we run into an undefined instruction when we jump to the
> > > > kexec relocation code. Do you know if the failure is reproducible, and is
> > > > the log identical each time?
> > >
> > > I had a go in a QEMU KVM VM on ThunderX2, and a QEMU KVM TCG VM. With defconfig
> > > I don't see the issue, but with the config from the test run link above I
> > > consistently see the issue both under KVM and TCG (logs below).
> > >
> > > It should be simple enough to figure out which config option is tickling this;
> > > I'll go dig in to that..
> >
> > Cheers, Mark. If you get a chance, it's probably also worth testing vanilla
> > -rc5 to confirm that it's a regression in our queue (which we could
> > assumedly bisect if necessary).
>
> I have met the enemy, and he is me:
>
> | git bisect start
> | # good: [2241ab53cbb5cdb08a6b2d4688feb13971058f65] Linux 6.2-rc5
> | git bisect good 2241ab53cbb5cdb08a6b2d4688feb13971058f65
> | # bad: [2e84eedb182e43a9113c2c83cc3373c2ae99ce19] Merge branch 'for-next/core' into for-kernelci
> | git bisect bad 2e84eedb182e43a9113c2c83cc3373c2ae99ce19
> | # good: [3eb1b41fba97a1586e3ecca8c10547071f541567] kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
> | git bisect good 3eb1b41fba97a1586e3ecca8c10547071f541567
> | # good: [daac835347a52d9d141be281e4657cc08a360e97] kselftest/arm64: Correct buffer size for SME ZA storage
> | git bisect good daac835347a52d9d141be281e4657cc08a360e97
> | # bad: [baaf553d3bc330697c68a00f96cf11f4edfeac7e] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
> | git bisect bad baaf553d3bc330697c68a00f96cf11f4edfeac7e
> | # good: [47a15aa544279d34e14e17ca3b5855e39b946cec] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
> | git bisect good 47a15aa544279d34e14e17ca3b5855e39b946cec
> | # good: [e4ecbe83fd1a5428d5458de04a3404f1b5444429] arm64: patching: Add aarch64_insn_write_literal_u64()
> | git bisect good e4ecbe83fd1a5428d5458de04a3404f1b5444429
> | # good: [90955d778ad7873964a271852b1f24d31e00248b] arm64: ftrace: Update stale comment
> | git bisect good 90955d778ad7873964a271852b1f24d31e00248b
> | # first bad commit: [baaf553d3bc330697c68a00f96cf11f4edfeac7e] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
>
> It looks like this is down to the function alignment; reverting that commit makes it go away, but if ia add:
>
> | diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> | index 6914f6bf41e22..8dafeea05864e 100644
> | --- a/arch/arm64/Kconfig
> | +++ b/arch/arm64/Kconfig
> | @@ -123,7 +123,7 @@ config ARM64
> |         select DMA_DIRECT_REMAP
> |         select EDAC_SUPPORT
> |         select FRAME_POINTER
> | -       select FUNCTION_ALIGNMENT_4B
> | +       select FUNCTION_ALIGNMENT_8B    # HACK HACK HACK
> |         select GENERIC_ALLOCATOR
> |         select GENERIC_ARCH_TOPOLOGY
> |         select GENERIC_CLOCKEVENTS_BROADCAST
>
> ... then it blows up again.
>
> So we're probably doing a clever address calculation in the kexec idmap code
> that ends up being wrong when the code gets shuffled a bit; possibly a
> mismatched caller/callee alignment.
>

$ grep -1 arm64_reloca System.map
ffff80000b0320fc T __relocate_new_kernel_start
ffff80000b032100 T arm64_relocate_new_kernel
ffff80000b032220 T __relocate_new_kernel_end

so the alignment results in arm64_relocate_new_kernel() appearing past
the start of the section, and the kexec code takes the address of the
section not the function symbol.

Something like the below should fix it, although it would arguably be
better to clean up the kexec code (but I'm reluctant to make a clean
spot)

index 407415a5163ab62f..89720ec2d830d51c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -102,6 +102,7 @@ jiffies = jiffies_64;

 #ifdef CONFIG_KEXEC_CORE
 #define KEXEC_TEXT                                     \
+       . = ALIGN(64);                                  \
        __relocate_new_kernel_start = .;                \
        *(.kexec_relocate.text)                         \
        __relocate_new_kernel_end = .;



More information about the linux-arm-kernel mailing list