[PATCH v3] kexec: Support purgatories with .text.hot sections

Ricardo Ribalda ribalda at chromium.org
Mon Mar 27 04:52:08 PDT 2023


Hi Philipp



On Fri, 24 Mar 2023 at 17:00, Philipp Rudo <prudo at redhat.com> wrote:
>
> Hi Ricardo,
>
> On Wed, 22 Mar 2023 20:09:21 +0100
> Ricardo Ribalda <ribalda at chromium.org> wrote:
>
> > Clang16 links the purgatory text in two sections:
> >
> >   [ 1] .text             PROGBITS         0000000000000000  00000040
> >        00000000000011a1  0000000000000000  AX       0     0     16
> >   [ 2] .rela.text        RELA             0000000000000000  00003498
> >        0000000000000648  0000000000000018   I      24     1     8
> >   ...
> >   [17] .text.hot.        PROGBITS         0000000000000000  00003220
> >        000000000000020b  0000000000000000  AX       0     0     1
> >   [18] .rela.text.hot.   RELA             0000000000000000  00004428
> >        0000000000000078  0000000000000018   I      24    17     8
> >
> > And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> > area pointed by `e_entry`.
> >
> > This causes that image->start is calculated twice, once for .text and
> > another time for .text.hot. The second calculation leaves image->start
> > in a random location.
> >
> > Because of this, the system crashes inmediatly after:
> >
> > kexec_core: Starting new kernel
>
> Great analysis!
>
> > Signed-off-by: Ricardo Ribalda <ribalda at chromium.org>
> > ---
> > kexec: Fix kexec_file_load for llvm16
> >
> > When upreving llvm I realised that kexec stopped working on my test
> > platform. This patch fixes it.
> >
> > To: Eric Biederman <ebiederm at xmission.com>
> > Cc: Baoquan He <bhe at redhat.com>
> > Cc: Philipp Rudo <prudo at redhat.com>
> > Cc: kexec at lists.infradead.org
> > Cc: linux-kernel at vger.kernel.org
> > ---
> > Changes in v3:
> > - Fix initial value. Thanks Ross!
> > - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org
> >
> > Changes in v2:
> > - Fix if condition. Thanks Steven!.
> > - Update Philipp email. Thanks Baoquan.
> > - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
> > ---
> >  kernel/kexec_file.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> > index f1a0e4e3fb5c..25a37d8f113a 100644
> > --- a/kernel/kexec_file.c
> > +++ b/kernel/kexec_file.c
> > @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
> >               }
> >
> >               offset = ALIGN(offset, align);
> > +
> > +             /*
> > +              * Check if the segment contains the entry point, if so,
> > +              * calculate the value of image->start based on it.
> > +              * If the compiler has produced more than one .text sections
> > +              * (Eg: .text.hot), they are generally after the main .text
> > +              * section, and they shall not be used to calculate
> > +              * image->start. So do not re-calculate image->start if it
> > +              * is not set to the initial value.
> > +              */
> >               if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
> >                   pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
> >                   pi->ehdr->e_entry < (sechdrs[i].sh_addr
> > -                                      + sechdrs[i].sh_size)) {
> > +                                      + sechdrs[i].sh_size) &&
> > +                 kbuf->image->start == pi->ehdr->e_entry) {
>
> I'm not entirely sure if this is the solution to go with. As you state
> in the comment above this solution assumes that the .text section comes
> before any other .text.* section. But this assumption isn't much
> stronger than the assumption that there is only a single .text section,
> which is used nowadays.
>
> The best solution I can come up with right now is to introduce a linker
> script for the purgatory that simply merges the .text sections into
> one. Similar to what I did for s390 in
> arch/s390/purgatory/purgatory.lds.S (although for a different reason).
> But that would require every architecture to get one. An alternative
> would be to find a way to get rid of the -r option on the LD_FLAGS,
> which IIRC is the reason why both section overlap in the first place.


I tried removing the -r from arch/x86/purgatory/Makefile and that resulted into:

[  115.631578] BUG: unable to handle page fault for address: ffff93224d5c8e20
[  115.631583] #PF: supervisor write access in kernel mode
[  115.631585] #PF: error_code(0x0002) - not-present page
[  115.631586] PGD 100000067 P4D 100000067 PUD 1001ed067 PMD 132b58067 PTE 0
[  115.631589] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  115.631592] CPU: 0 PID: 5291 Comm: kexec-lite Tainted: G     U
      5.15.103-17399-g852a928df601-dirty #19
cd159e0d6a91f03e06035a0a8eb7fc984a8f3e82
[  115.631594] Hardware name: Google Crota/Crota, BIOS
Google_Crota.14505.288.0 11/08/2022
[  115.631595] RIP: 0010:memcpy_erms+0x6/0x10
[  115.631599] Code: 5d 00 eb bd eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 cc cc cc cc 66 90 48 89 f8
48 89 d1 <f3> a4 c3 cc cc cc cc 0f 1f 00 48 89 f8 48 83 fa 20 72 7e 40
38 fe
[  115.631601] RSP: 0018:ffff93224f65fe50 EFLAGS: 00010246
[  115.631602] RAX: ffff93224d5c8e20 RBX: 00000000ffffffea RCX: 0000000000000100
[  115.631603] RDX: 0000000000000100 RSI: ffff9322407bd000 RDI: ffff93224d5c8e20
[  115.631604] RBP: ffff93224f65fe88 R08: 0000000000000000 R09: ffff92133cd3ef08
[  115.631605] R10: ffff9322407be000 R11: ffffffffa1b4f2e0 R12: 0000000000000000
[  115.631606] R13: ffff92133cee4c00 R14: 0000000000000100 R15: ffffffffa2b6f14f
[  115.631607] FS:  000078e8b9dbf7c0(0000) GS:ffff921437800000(0000)
knlGS:0000000000000000
[  115.631609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.631610] CR2: ffff93224d5c8e20 CR3: 000000015be26001 CR4: 0000000000770ef0
[  115.631611] PKRU: 55555554
[  115.631612] Call Trace:
[  115.631614]  <TASK>
[  115.631615]  kexec_purgatory_get_set_symbol+0x82/0xd3
[  115.631619]  __se_sys_kexec_file_load+0x523/0x644
[  115.631621]  do_syscall_64+0x58/0xa5
[  115.631623]  entry_SYSCALL_64_after_hwframe+0x61/0xcb


And I did not continue in that direction.

I also tried finding a flag for llvm that would avoid splitting .text,
but was not lucky either.

I will look into making a linker script for x86, we could combine it
with something like:

                if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
                    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
                    pi->ehdr->e_entry < (sechdrs[i].sh_addr
-                                        + sechdrs[i].sh_size) &&
-                   kbuf->image->start == pi->ehdr->e_entry) {
-                       kbuf->image->start -= sechdrs[i].sh_addr;
-                       kbuf->image->start += kbuf->mem + offset;
+                                        + sechdrs[i].sh_size)) {
+                       if (!WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
+                               kbuf->image->start -= sechdrs[i].sh_addr;
+                               kbuf->image->start += kbuf->mem + offset;
+                       }
                }

So developers have some hints of what to look at.

Thanks!


>
> Thanks
> Philipp
>
> >                       kbuf->image->start -= sechdrs[i].sh_addr;
> >                       kbuf->image->start += kbuf->mem + offset;
> >               }
> >
> > ---
> > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> > change-id: 20230321-kexec_clang16-4510c23d129c
> >
> > Best regards,
>


-- 
Ricardo Ribalda



More information about the kexec mailing list