Re: ❌ FAIL: Test report for?kernel?5.11.0-rc7 (arm-next)

Veronika Kabatova vkabatov at redhat.com
Wed Feb 10 13:06:08 EST 2021



----- Original Message -----
> From: "Will Deacon" <will at kernel.org>
> To: "Veronika Kabatova" <vkabatov at redhat.com>
> Cc: "catalin marinas" <catalin.marinas at arm.com>, "CKI Project" <cki-project at redhat.com>,
> linux-arm-kernel at lists.infradead.org
> Sent: Wednesday, February 10, 2021 6:31:34 PM
> Subject: Re: ❌ FAIL: Test report	for?kernel?5.11.0-rc7 (arm-next)
> 
> On Wed, Feb 10, 2021 at 12:07:23PM -0500, Veronika Kabatova wrote:
> > > On Wed, Feb 10, 2021 at 10:24:31AM -0500, Veronika Kabatova wrote:
> > > > Hi, I have a few results back:
> > > > 
> > > > - resubmitted the same kernel: gets stuck in the same spot
> > > > - tried the new version pushed today: gets stuck in the same spot
> > > 
> > > That's odd, as I just received a pass report for that branch!
> > > 
> > > https://lore.kernel.org/r/cki.598435E2D5.M3C5MKJ1NV@redhat.com
> > > 
> > > Is it just flakey, perhaps? Obviously, that's not great either, but it
> > > will
> > > make bisection more challenging.
> > > 
> > 
> > We have a large number of machines (both physical and virtual) and it's
> > impossible to run all tests on all of them, so they are randomly picked as
> > long as they fit the distro and test requirements. The distribution for
> > ARM tree is 1 physical and 1 "any" machine (which usually ends up being
> > virtual). The jobs from the report you linked ran on different machines
> > and didn't pick the one that failed to boot previously, so I manually
> > forced my testing to pick that machine to eliminate some variables.
> 
> Ah thanks, I hadn't twigged that it was a different set of hosts each time.
> Makes sense.
> 
> > The machine in question can on course be somewhat flaky (hard to eliminate
> > that possibility completely), but I checked our historical data and it
> > didn't fail to boot a single time other than with these two new kernels.
> 
> So the first thing we should probably try is whether vanilla -rc7 fails on
> the machine causing us problems. If it does, then the arm64 queue for 5.12
> is out of the equation, if not then we can try a targetted bisection.
> 
> Would you be able to try v5.11-rc7 please?

Can do.

Someone snatched up the machine in the meanwhile and appears to have it
reserved till next Monday :/ I sincerely hope that they'll release it sooner
and have queued up the test job with high priority. Will let you know right
as I get the results.

> 
> > > > - tried the version from last week: boots ok
> > > >
> > > > There is an extra message from the run that managed to boot, which is
> > > > not
> > > > present with any of the runs that failed:
> > > > 
> > > > EFI stub: ERROR: FIRMWARE BUG: efi_loaded_image_t::image_base has bogus
> > > > value
> > > > 
> > > > But this message is not present with the stable run that I mentioned
> > > > previously.
> > > 
> > > Interesting. Are those messages in the logs anywhere? It would be handy
> > > to
> > > include them, if possible.
> > > 
> > 
> > The messages are from before the kernel boot banner which is the marker
> > we use for log inclusion (to reduce console log spam from distro
> > installation which uses a different kernel and thus makes debugging less
> > straightforward). The same EFI messages are present before the kernel
> > banner in the new report you linked, and with the passing job from the
> > previous runs as well:
> > 
> > EFI stub: Booting Linux Kernel...
> > EFI stub: Using DTB from configuration table
> > EFI stub: Exiting boot services and installing virtual address map...
> > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
> > [    0.000000] Linux version 5.11.0-rc7
> > (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc
> > (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version
> > 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021
> > [    0.000000] efi: EFI v2.70 by EDK II
> > [    0.000000] efi: SMBIOS 3.0=0x1bf760000 MEMATTR=0x1be656018 ACPI
> > 2.0=0x1bc030000 RNG=0x1bf86cf98 MEMRESERVE=0x1bc3d3e18
> > [    0.000000] efi: seeding entropy pool
> > ....
> > 
> > and
> > 
> > EFI stub: Booting Linux Kernel...
> > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> > EFI stub: Using DTB from configuration table
> > EFI stub: Exiting boot services and installing virtual address map...
> > [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1]
> > [    0.000000] Linux version 5.11.0-rc7
> > (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc
> > (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version
> > 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021
> > [    0.000000] efi: EFI v2.70 by American Megatrends
> > ....
> > 
> > The failing machine/kernel combos get stuck right after that last EFI
> > line before the kernel messages come in.
> 
> Sorry, just to be clear here: do we always fail when we have the "KASLR
> will be disabled" message, or do some machines pass with that?
> 

No. The message is sometimes present, sometimes not (likely machine
dependent), and doesn't seem to influence the boot results. Both of the
log excerpts above are from jobs that booted.


Veronika

> Thanks again,
> 
> Will
> 
> 




More information about the linux-arm-kernel mailing list