[PATCH V3] panic: Move panic_print before kmsg dumpers
Baoquan He
bhe at redhat.com
Sat Jan 29 00:00:27 PST 2022
On 01/26/22 at 12:51pm, Petr Mladek wrote:
> On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <bhe at redhat.com> Sent: Friday, January 21, 2022 8:34 PM
> > >
> > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <bhe at redhat.com> Sent: Thursday, January 20, 2022 6:31 PM
> > > > >
> > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > Hi Baoquan, some comments inline below:
> > > > > >
> > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> >
> > [snip]
> >
> > > > > > Do you think it should be necessary?
> > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > might have a much clear code.
> > > > >
> > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > leave it to user to decide.
> > > > >
> > > >
> > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > physical memory pages get used for something else.
> >
> > In the Azure cloud, collecting data before crash dumps is a motivation
> > as well for setting crash_kexec_post_notifiers to true. That way as
> > cloud operator we can see broad failure trends, and in specific cases
> > customers often expect the cloud operator to be able to provide info
> > about a problem even if they have taken a kdump. Where did you
> > envision adding a comment in the code to help clarify these intentions?
> >
> > I looked at the code again, and should revise my previous comments
> > somewhat. The Hyper-V resets that I described indeed must be done
> > prior to kexec'ing the kdump kernel. Most such resets are actually
> > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > panic notifier. However, the Hyper-V panic notifier must terminate the
> > VMbus connection, because that must be done even if kdump is not
> > being invoked. See commit 74347a99e73.
> >
> > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > were probably due to the machine_crash_shutdown() path, and not due
> > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > exception is terminating the VMbus connection, which had problems that
> > are hopefully now fixed because of adding a timeout.
>
> My undestanding is that we could split the actions into three groups:
>
> 1. Actions that has to be before kexec'ing kdump kernel, like
> resetting physicall memory shared with Hyper-V.
>
> These operation(s) are needed only for kexec and can be done
> in kexec.
>
>
> 2. Notify Hyper-V so that, for example, Azure cloud, could collect
> data before crash dump.
>
> It is nice to have.
>
> It should be configurable if it is not completely safe. I mean
> that there should be a way to disable it when it might increase
> the risk that kexec'ing kdump kernel might fail.
>
>
> 3. Some actions are needed only when panic() ends up in the
> infinite loop.
>
> For example, unloading vmbus channel. At least the commit
> 74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
> hv panic callback") says that it is done in kdump path
> out of box.
>
> All these operations are needed and used only when the kernel is
> running under Hyper-V.
>
> My mine intention is to understand if we need 2 or 3 notifier lists
> or the current one is enough.
>
> The 3 notifier lists would be:
>
> + always do (even before kdump)
> + optionally do before or after kdump
> + needed only when kdump is not called
Totally agree with above suggesitons for Hyper-V. Cleanup as ablove
seems necesary. Stuffing them into panic_notifiers package is not
appropriate.
More information about the kexec
mailing list