Re: ❌ FAIL:?Test?report?for?kernel?5.11.0-rc7 (arm-next)

Veronika Kabatova vkabatov at redhat.com
Mon Feb 15 08:13:18 EST 2021



----- Original Message -----
> From: "Veronika Kabatova" <vkabatov at redhat.com>
> To: "Will Deacon" <will at kernel.org>
> Cc: "catalin marinas" <catalin.marinas at arm.com>, linux-arm-kernel at lists.infradead.org, "CKI Project"
> <cki-project at redhat.com>
> Sent: Thursday, February 11, 2021 1:25:34 PM
> Subject: Re: ❌ FAIL:?Test?report?for?kernel?5.11.0-rc7 (arm-next)
> 
> 
> 
> ----- Original Message -----
> > From: "Will Deacon" <will at kernel.org>
> > To: "Veronika Kabatova" <vkabatov at redhat.com>
> > Cc: "catalin marinas" <catalin.marinas at arm.com>,
> > linux-arm-kernel at lists.infradead.org, "CKI Project"
> > <cki-project at redhat.com>
> > Sent: Thursday, February 11, 2021 12:50:50 PM
> > Subject: Re: ❌ FAIL:?Test?report?for?kernel?5.11.0-rc7 (arm-next)
> > 
> > On Thu, Feb 11, 2021 at 05:46:02AM -0500, Veronika Kabatova wrote:
> > > > On Wed, Feb 10, 2021 at 08:17:53PM +0000, Will Deacon wrote:
> > > > > On Wed, Feb 10, 2021 at 02:31:45PM -0500, Veronika Kabatova wrote:
> > > > > > > > > > The machine in question can on course be somewhat flaky
> > > > > > > > > > (hard
> > > > > > > > > > to
> > > > > > > > > > eliminate
> > > > > > > > > > that possibility completely), but I checked our historical
> > > > > > > > > > data
> > > > > > > > > > and it
> > > > > > > > > > didn't fail to boot a single time other than with these two
> > > > > > > > > > new
> > > > > > > > > > kernels.
> > > > > > > > > 
> > > > > > > > > So the first thing we should probably try is whether vanilla
> > > > > > > > > -rc7
> > > > > > > > > fails
> > > > > > > > > on
> > > > > > > > > the machine causing us problems. If it does, then the arm64
> > > > > > > > > queue
> > > > > > > > > for
> > > > > > > > > 5.12
> > > > > > > > > is out of the equation, if not then we can try a targetted
> > > > > > > > > bisection.
> > > > > > > > > 
> > > > > > > > > Would you be able to try v5.11-rc7 please?
> > > > > > > > 
> > > > > > > > Can do.
> > > > > > > 
> > > > > > > Brill, thanks.
> > > > > > > 
> > > > > > > > Someone snatched up the machine in the meanwhile and appears to
> > > > > > > > have
> > > > > > > > it
> > > > > > > > reserved till next Monday :/ I sincerely hope that they'll
> > > > > > > > release it
> > > > > > > > sooner
> > > > > > > > and have queued up the test job with high priority. Will let
> > > > > > > > you
> > > > > > > > know
> > > > > > > > right
> > > > > > > > as I get the results.
> > > > > > > 
> > > > > > > Just tell them the machine is broken and they really don't want
> > > > > > > to
> > > > > > > use
> > > > > > > it
> > > > > > > for anything important ;)
> > > > > > > 
> > > > > > 
> > > > > > Our wishes were granted by the lab fairy and the machine was
> > > > > > returned
> > > > > > rather quickly :)
> > > > > > 
> > > > > > The 5.11-rc7 kernel boots.
> > > > > 
> > > > > Fantastic! Then it's something in the arm64 for-next/core tree that
> > > > > was
> > > > > added since then. The diff isn't huge and one change stands out for
> > > > > me,
> > > > > so
> > > > > let me try reverting that and I'll update the branch...
> > > > 
> > > > Ok, I updated for-kernelci so that it contains the arm64 for-next/core
> > > > branch merged into -rc7, but with a couple of patches reverted on top.
> > > > 
> > > > HEAD is	e56137cc7606 ("Revert "arm64/mm: Fix pfn_valid() for
> > > > ZONE_DEVICE
> > > > based memory""). Please can you try that?
> > > > 
> > > 
> > > I didn't actually have to, the autopick picked the machine for the
> > > testing
> > > and it happily booted :) \o/
> > 
> > Phew, so I'll drop those from linux-next as well. Do you know if "earlycon"
> > works on the problematic machine? If possible, it would be helpful to try
> > booting the bad kernel with earlycon on the cmdline to see if it manages to
> > say anything in its dying breath.
> > 
> 
> No idea, we don't have the option enabled and I can't find any information
> about it. I submitted a new job with the option and let's see whether it
> works or not. The machine is reserved again so there may be some delays.
> 

I finally got access to the machine and don't have any good news. The hang
is from before early printk takes effect. The output is identical to what
I sent before, with nothing resembling kernel lines in there.


Veronika

> > > Apparently IT cut off our email sending access because we're sending too
> > > many emails so that's why you don't have the report yet. When we work
> > > around it I'll make sure to retrigger the sending so you have it.
> > 
> > That's nice of them...
> > 
> 
> Update for the drama loving folks: IT is actually innocent in this.
> Someone deployed a misconfigured application in the same cluster as our
> reporting system is in, and this application is sending out emails
> every 3 seconds. We're collateral damage. Working on resolution, but
> right now it seems that the situation is under control.
> 
> 
> Veronika
> 
> > Will
> > 
> > 
> 
> 




More information about the linux-arm-kernel mailing list