Re: ❌ FAIL: Test report for kernel?5.11.0-rc7 (arm-next)

Veronika Kabatova vkabatov at redhat.com
Wed Feb 10 12:07:23 EST 2021



----- Original Message -----
> From: "Will Deacon" <will at kernel.org>
> To: "Veronika Kabatova" <vkabatov at redhat.com>
> Cc: "catalin marinas" <catalin.marinas at arm.com>, "CKI Project" <cki-project at redhat.com>,
> linux-arm-kernel at lists.infradead.org
> Sent: Wednesday, February 10, 2021 5:09:37 PM
> Subject: Re: ❌ FAIL: Test report for	kernel?5.11.0-rc7 (arm-next)
> 
> Hi Veronika,
> 
> Thanks for the help with this.
> 
> On Wed, Feb 10, 2021 at 10:24:31AM -0500, Veronika Kabatova wrote:
> > > > On Tue, Feb 09, 2021 at 09:07:50PM -0000, CKI Project wrote:
> > > > >     Host 2:
> > > > >        ❌ Boot test
> > > > >        ⚡⚡⚡ selinux-policy: serge-testsuite
> > > > >        ⚡⚡⚡ storage: software RAID testing
> > > > >        🚧 ⚡⚡⚡ xfstests - ext4
> > > > >        🚧 ⚡⚡⚡ xfstests - xfs
> > > > >        🚧 ⚡⚡⚡ xfstests - btrfs
> > > > >        🚧 ⚡⚡⚡ IPMI driver test
> > > > >        🚧 ⚡⚡⚡ IPMItool loop stress test
> > > > >        🚧 ⚡⚡⚡ Storage blktests
> > > > >        🚧 ⚡⚡⚡ Storage block - filesystem fio test
> > > > >        🚧 ⚡⚡⚡ Storage block - queue scheduler test
> > > > >        🚧 ⚡⚡⚡ Storage nvme - tcp
> > > > >        🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
> > > > >        🚧 ⚡⚡⚡ stress: stress-ng
> > > > 
> > > > Which system (e.g. soc) is host 2 and are there are known infra issues
> > > > at
> > > > the moment? I did push some changes which affect the early boot path,
> > > > so we
> > > > may well be running into a kernel bug, but I'd just like to make sure
> > > > before
> > > > we dive in trying to debug that, especially as we haven't seen failures
> > > > on
> > > > other systems (and host 1 seems ok).
> > > > 
> > > 
> > > Hi, the machine in question is a Cavium ThunderX2 Sabre. It booted a
> > > stable
> > > kernel just a few days back okay. The last messages I can see in the raw
> > > console log from this run are:
> > > 
> > > EFI stub: Booting Linux Kernel...
> > > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> > > EFI stub: Using DTB from configuration table
> > > EFI stub: Exiting boot services and installing virtual address map...
> > > 
> > > and then it times out after hour and half. I'm not aware of any ongoing
> > > issues, however sometimes the link between the lab controller and the
> > > machines can sometimes go wrong after reboot and lead to a similarly
> > > looking problem.
> > > 
> > > I'll resubmit the test job on that same machine to check if that was
> > > the case and let you know right after it boots.
> > > 
> > 
> > Hi, I have a few results back:
> > 
> > - resubmitted the same kernel: gets stuck in the same spot
> > - tried the new version pushed today: gets stuck in the same spot
> 
> That's odd, as I just received a pass report for that branch!
> 
> https://lore.kernel.org/r/cki.598435E2D5.M3C5MKJ1NV@redhat.com
> 
> Is it just flakey, perhaps? Obviously, that's not great either, but it will
> make bisection more challenging.
> 

We have a large number of machines (both physical and virtual) and it's
impossible to run all tests on all of them, so they are randomly picked as
long as they fit the distro and test requirements. The distribution for
ARM tree is 1 physical and 1 "any" machine (which usually ends up being
virtual). The jobs from the report you linked ran on different machines
and didn't pick the one that failed to boot previously, so I manually
forced my testing to pick that machine to eliminate some variables.

The machine in question can on course be somewhat flaky (hard to eliminate
that possibility completely), but I checked our historical data and it
didn't fail to boot a single time other than with these two new kernels.

> > - tried the version from last week: boots ok
> >
> > There is an extra message from the run that managed to boot, which is not
> > present with any of the runs that failed:
> > 
> > EFI stub: ERROR: FIRMWARE BUG: efi_loaded_image_t::image_base has bogus
> > value
> > 
> > But this message is not present with the stable run that I mentioned
> > previously.
> 
> Interesting. Are those messages in the logs anywhere? It would be handy to
> include them, if possible.
> 

The messages are from before the kernel boot banner which is the marker
we use for log inclusion (to reduce console log spam from distro
installation which uses a different kernel and thus makes debugging less
straightforward). The same EFI messages are present before the kernel
banner in the new report you linked, and with the passing job from the
previous runs as well:

EFI stub: Booting Linux Kernel... 
EFI stub: Using DTB from configuration table 
EFI stub: Exiting boot services and installing virtual address map... 
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
[    0.000000] Linux version 5.11.0-rc7 (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
[    0.000000] efi: EFI v2.70 by EDK II 
[    0.000000] efi: SMBIOS 3.0=0x1bf760000 MEMATTR=0x1be656018 ACPI 2.0=0x1bc030000 RNG=0x1bf86cf98 MEMRESERVE=0x1bc3d3e18  
[    0.000000] efi: seeding entropy pool 
....

and

EFI stub: Booting Linux Kernel... 
EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled 
EFI stub: Using DTB from configuration table 
EFI stub: Exiting boot services and installing virtual address map... 
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
[    0.000000] Linux version 5.11.0-rc7 (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
[    0.000000] efi: EFI v2.70 by American Megatrends 
....

The failing machine/kernel combos get stuck right after that last EFI
line before the kernel messages come in.


Let me know if I should test some other versions or if you need some
other information!

Veronika

> Cheers,
> 
> Will
> 
> 
> 




More information about the linux-arm-kernel mailing list