[PATCH V3] panic: Move panic_print before kmsg dumpers

Fri Jan 28 10:24:55 PST 2022

From: Baoquan He <bhe at redhat.com> Sent: Friday, January 28, 2022 1:03 AM
> 
> On 01/24/22 at 04:57pm, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <bhe at redhat.com> Sent: Friday, January 21, 2022 8:34 PM
> > >
> > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <bhe at redhat.com> Sent: Thursday, January 20, 2022 6:31 PM
> > > > >
> > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > Hi Baoquan, some comments inline below:
> > > > > >
> > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> >
> > [snip]
> >
> > > > > > Do you think it should be necessary?
> > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > might have a much clear code.
> > > > >
> > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > leave it to user to decide.
> > > > >
> > > >
> > > > Regarding Hyper-V:   Invoking the Hyper-V notifier prior to running the
> > > > kdump kernel is necessary for correctness.  During initial boot of the
> > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > may write to.   A VMbus connection is also established. Before kexec'ing
> > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > and the VMbus connection must be terminated.   If this isn't done, the
> > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > physical memory pages get used for something else.
> > > >
> > > > I hope we've found and fixed all the problems where the Hyper-V
> > > > notifier could get hung.  Unfortunately, the Hyper-V interfaces were
> > > > designed long ago without the Linux kexec scenario in mind, and they
> > > > don't provide a simple way to reset everything except by doing a
> > > > reboot that goes back through the virtual BIOS/UEFI.  So the Hyper-V
> > > > notifier code is more complicated than would be desirable, and in
> > > > particular, terminating the VMbus connection is tricky.
> > > >
> > > > This has been an evolving area of understanding.  It's only been the last
> > > > couple of years that we've fully understood the implications of these
> > > > shared memory pages on the kexec/kdump scenario and what it takes
> > > > to reset everything so the kexec'ed kernel will work.
> > >
> > > Glad to know these background details, thx, Michael. While from the
> > > commit which introduced it and the code comment above code, I thought
> > > Hyper-V wants to collect data before crash dump. If this is the true,
> > > it might be helpful to add these in commit log or add as code comment,
> > > and also help to defend you when people question it.
> > >
> > > int __init hv_common_init(void)
> > > {
> > >         int i;
> > >
> > >         /*
> > >          * Hyper-V expects to get crash register data or kmsg when
> > >          * crash enlightment is available and system crashes. Set
> > >          * crash_kexec_post_notifiers to be true to make sure that
> > >          * calling crash enlightment interface before running kdump
> > >          * kernel.
> > >          */
> > >         if (ms_hyperv.misc_features &
> HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
> > >                 crash_kexec_post_notifiers = true;
> > >
> > > 	......
> > > }
> >
> > In the Azure cloud, collecting data before crash dumps is a motivation
> > as well for setting crash_kexec_post_notifiers to true.   That way as
> > cloud operator we can see broad failure trends, and in specific cases
> > customers often expect the cloud operator to be able to provide info
> > about a problem even if they have taken a kdump.  Where did you
> > envision adding a comment in the code to help clarify these intentions?
> >
> > I looked at the code again, and should revise my previous comments
> > somewhat.   The Hyper-V resets that I described indeed must be done
> > prior to kexec'ing the kdump kernel.   Most such resets are actually
> > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > panic notifier. However, the Hyper-V panic notifier must terminate the
> > VMbus connection, because that must be done even if kdump is not
> > being invoked.  See commit 74347a99e73.
> >
> > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > were probably due to the machine_crash_shutdown() path, and not due
> > to running the panic notifiers prior to kexec'ing the kdump kernel.  The
> > exception is terminating the VMbus connection, which had problems that
> > are hopefully now fixed because of adding a timeout.
> Thanks for detailed information.
> 
> So I can understand the status as:
> ===
> Hyper-V needed panic_notifier to execute before __crash_kexec() in
> the past, because VMbus connection need be terminated, that's done in
> commit 74347a99e73 as a workaround when panic happened, whether kdump is
> enabled or not. But now, the VMbus connection termination is not needed
> anymore since it's fixed by adding a timeout on Hyper-V.

No.  Sorry I wasn't clear.  Even now, specific action needs to be taken to
terminate the VMbus connection before __crash_kexec() runs so that
the new kdump kernel can start fresh and establish its own VMbus
connection.  You had originally mentioned hang problems occurring
because of running the Hyper-V panic notifier before __crash_kexec().
Terminating the VMbus connection waits for a reply from Hyper-V
because terminating the connection can take a while (10's seconds)
if Hyper-V has a lot of disk data cached.  Dirty data must be flushed back
to a cloud disk before the kdump kernel runs (otherwise other weird stuff
happens in the kdump kernel).  We've added a timeout in Linux so that if
for whatever reason Hyper-V fails to reply, __crash_kexec() still gets called.
Hopefully that timeout cures any hang problems that were previously
seen.  But the timeout does not remove the need to terminate the
VMbus connection.

Michael

> 
> Then, in the current kernel, panic_notifier is taken to execute on Hyper-V
> by default just because of one reason, Hyper-V wants to collect data
> before crash dump. The data collecting is motivate by trying to see
> broad failure trends as cloud operator on Azure cloud, and in specific
> cases providing info to customer even if they have taken vmcore.
> ===
> 
> Do I get it right?