❌ FAIL: Test report for?kernel?5.11.0-rc7 (arm-next)

Will Deacon will at kernel.org
Wed Feb 10 12:31:34 EST 2021


On Wed, Feb 10, 2021 at 12:07:23PM -0500, Veronika Kabatova wrote:
> > On Wed, Feb 10, 2021 at 10:24:31AM -0500, Veronika Kabatova wrote:
> > > Hi, I have a few results back:
> > > 
> > > - resubmitted the same kernel: gets stuck in the same spot
> > > - tried the new version pushed today: gets stuck in the same spot
> > 
> > That's odd, as I just received a pass report for that branch!
> > 
> > https://lore.kernel.org/r/cki.598435E2D5.M3C5MKJ1NV@redhat.com
> > 
> > Is it just flakey, perhaps? Obviously, that's not great either, but it will
> > make bisection more challenging.
> > 
> 
> We have a large number of machines (both physical and virtual) and it's
> impossible to run all tests on all of them, so they are randomly picked as
> long as they fit the distro and test requirements. The distribution for
> ARM tree is 1 physical and 1 "any" machine (which usually ends up being
> virtual). The jobs from the report you linked ran on different machines
> and didn't pick the one that failed to boot previously, so I manually
> forced my testing to pick that machine to eliminate some variables.

Ah thanks, I hadn't twigged that it was a different set of hosts each time.
Makes sense.

> The machine in question can on course be somewhat flaky (hard to eliminate
> that possibility completely), but I checked our historical data and it
> didn't fail to boot a single time other than with these two new kernels.

So the first thing we should probably try is whether vanilla -rc7 fails on
the machine causing us problems. If it does, then the arm64 queue for 5.12
is out of the equation, if not then we can try a targetted bisection.

Would you be able to try v5.11-rc7 please?

> > > - tried the version from last week: boots ok
> > >
> > > There is an extra message from the run that managed to boot, which is not
> > > present with any of the runs that failed:
> > > 
> > > EFI stub: ERROR: FIRMWARE BUG: efi_loaded_image_t::image_base has bogus
> > > value
> > > 
> > > But this message is not present with the stable run that I mentioned
> > > previously.
> > 
> > Interesting. Are those messages in the logs anywhere? It would be handy to
> > include them, if possible.
> > 
> 
> The messages are from before the kernel boot banner which is the marker
> we use for log inclusion (to reduce console log spam from distro
> installation which uses a different kernel and thus makes debugging less
> straightforward). The same EFI messages are present before the kernel
> banner in the new report you linked, and with the passing job from the
> previous runs as well:
> 
> EFI stub: Booting Linux Kernel... 
> EFI stub: Using DTB from configuration table 
> EFI stub: Exiting boot services and installing virtual address map... 
> [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
> [    0.000000] Linux version 5.11.0-rc7 (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
> [    0.000000] efi: EFI v2.70 by EDK II 
> [    0.000000] efi: SMBIOS 3.0=0x1bf760000 MEMATTR=0x1be656018 ACPI 2.0=0x1bc030000 RNG=0x1bf86cf98 MEMRESERVE=0x1bc3d3e18  
> [    0.000000] efi: seeding entropy pool 
> ....
> 
> and
> 
> EFI stub: Booting Linux Kernel... 
> EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled 
> EFI stub: Using DTB from configuration table 
> EFI stub: Exiting boot services and installing virtual address map... 
> [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
> [    0.000000] Linux version 5.11.0-rc7 (cki at runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
> [    0.000000] efi: EFI v2.70 by American Megatrends 
> ....
> 
> The failing machine/kernel combos get stuck right after that last EFI
> line before the kernel messages come in.

Sorry, just to be clear here: do we always fail when we have the "KASLR
will be disabled" message, or do some machines pass with that?

Thanks again,

Will



More information about the linux-arm-kernel mailing list