[REGRESSION] kexec does firmware reboot in kernel v6.7.6

Dave Young dyoung at redhat.com
Thu Mar 14 02:25:52 PDT 2024


On Thu, 14 Mar 2024 at 00:18, Steve Wahl <steve.wahl at hpe.com> wrote:
>
> On Wed, Mar 13, 2024 at 07:16:23AM -0500, Eric W. Biederman wrote:
> >
> > Kexec happens on identity mapped page tables.
> >
> > The files of interest are machine_kexec_64.c and relocate_kernel_64.S
> >
> > I suspect either the building of the identity mappged page table in
> > machine_kexec_prepare, or the switching to the page table in
> > identity_mapped in relocate_kernel_64.S is where something goes wrong.
> >
> > Probably in kernel_ident_mapping_init as that code is directly used
> > to build the identity mapped page tables.
> >
> > Hmm.
> >
> > Your change is commit d794734c9bbf ("x86/mm/ident_map: Use gbpages only
> > where full GB page should be mapped.")
>
> Yeah, sorry, I accidentally used the stable cherry-pick commit id that
> Pavin Joseph found with his bisect results.
>
> > Given the simplicity of that change itself my guess is that somewhere in
> > the first 1Gb there are pages that needed to be mapped like the idt at 0
> > that are not getting mapped.
>
> ...
>
> > It might be worth setting up early printk on some of these systems
> > and seeing if the failure is in early boot up of the new kernel (that is
> > using kexec supplied identity mapped pages) rather than in kexec per-se.
> >
> > But that is just my guess at the moment.
>
> Thanks for the input.  I was thinking in terms of running out of
> memory somewhere because we're using more page table entries than we
> used to.  But you've got me thinking that maybe some necessary region
> is not explicitly requested to be placed in the identity map, but is
> by luck included in the rounding errors when we use gbpages.

Yes, it is possible. Here is an example case:
http://lists.infradead.org/pipermail/kexec/2023-June/027301.html
Final change was to avoid doing AMD things on Intel platform, but the
mapping code is still not fixed in a good way.

>
> At any rate, since I am still unable to reproduce this for myself, I
> am going to contact Pavin Joseph off-list and see if he's willing to
> do a few debugging kernel steps for me and send me the results, to see
> if I can get this figured out.  (I believe trimming the CC list and/or
> going private is usually frowned upon for the LKML, but I think this
> is appropriate as it only adds noise for the rest.  Let me know if I'm
> wrong.)
>
> Thank you.
>
> --> Steve Wahl
>
> --
> Steve Wahl, Hewlett Packard Enterprise
>
> _______________________________________________
> kexec mailing list
> kexec at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
>




More information about the kexec mailing list