Compile error ppc64le: Cannot find symbol for section 11: .text.unlikely.

Coiby Xu coxu at redhat.com
Wed Mar 2 16:49:12 PST 2022


On Wed, Mar 02, 2022 at 11:52:12AM +0100, Veronika Kabatova wrote:
>On Wed, Mar 2, 2022 at 8:50 AM Coiby Xu <coxu at redhat.com> wrote:
>>
>> On Fri, Feb 25, 2022 at 11:46:41AM +0800, Coiby Xu wrote:
>> >On Fri, Dec 03, 2021 at 04:54:19PM +0100, Veronika Kabatova wrote:
>> >>On Wed, Dec 1, 2021 at 3:20 AM Coiby Xu <coxu at redhat.com> wrote:
>> >>>
>> >>>On Wed, Nov 24, 2021 at 09:47:43PM +0800, Baoquan He wrote:
>> >>>>On 11/24/21 at 01:47pm, Veronika Kabatova wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> for a while we've been seen the following error when compiling
>> >>>>> the mainline kernel with gcc 11.2 and binutils 2.37:
>> >>>>>
>> >>>>> 00:02:32 Cannot find symbol for section 11: .text.unlikely.
>> >>>>> 00:02:32 kernel/kexec_file.o: failed
>> >>>>> 00:02:32 make[3]: *** [scripts/Makefile.build:287: kernel/kexec_file.o] Error 1
>> >>>>> 00:02:32 make[3]: *** Deleting file 'kernel/kexec_file.o'
>> >>>>> 00:02:32 make[2]: *** [Makefile:1846: kernel] Error 2
>> >>>>> 00:02:32 make[2]: *** Waiting for unfinished jobs....
>> >>>>>
>> >>>>> The error only happens with ppc64le. I've tested this with cross
>> >>>>> compilation, but the only reference to the error I found suggests
>> >>>>> the same happens with the native compiles as well:
>> >>>>>
>> >>>>> https://github.com/groeck/linux-build-test/commit/142cbefbc0d37962c9a6c7f28ee415ecd5fd1e98
>> >>>>>
>> >>>>> In case it matters, the config used is the Fedora config with
>> >>>>> kselftest options enabled, which you can grab from
>> >>>>>
>> >>>>> https://gitlab.com/redhat/red-hat-ci-tools/kernel/cki-internal-pipelines/cki-trusted-contributors/-/jobs/1760752896/artifacts/raw/artifacts/kernel-mainline.kernel.org-ppc64le-e4e737bb5c170df6135a127739a9e6148ee3da82.config
>> >>>>>
>> >>>>>
>> >>>>> I've reached out to the Fedora compiler folks and Nick Clifton
>> >>>>> suggested this is a problem with the kernel:
>> >>>>>
>> >>>>>     This message comes from the recordmcount tool, which is part of the kernel
>> >>>>>     sources:
>> >>>>>
>> >>>>>     linux/scripts/recordmcount.[ch]
>> >>>>>
>> >>>>>     It appears to be triggered when a compiler update causes code to be
>> >>>>>     rearranged. The problem has been reported before in various forums,
>> >>>>>     but in particular I found this reference:
>> >>>>>
>> >>>>>     https://lore.kernel.org/lkml/20201204165742.3815221-2-arnd@kernel.org/
>> >>>>>
>> >>>>>     The point of which to me at least is that this is a kernel issue rather than
>> >>>>>     a compiler issue.  Ie there must be some weak symbols in kexec_file.o file
>> >>>>>     which need to be moved elsewhere.
>> >>>>
>> >>>>It could be arch_kexec_kernel_verify_sig() in kernel/kexec_file.c which
>> >>>>is __weak, but not implemented in any ARCH. If true, this has been
>> >>>>pointed out by Eric in one patch thread from Coiby.
>> >>>>
>> >>>>[PATCH v3 1/3] kexec: clean up arch_kexec_kernel_verify_sig
>> >>>>http://lkml.kernel.org/r/20211018083137.338757-2-coxu@redhat.com
>> >>>>
>> >>>>Maybe Coiby can fetch above config file and run the test to check.
>> >>>
>> >>>"[PATCH v3 1/3] kexec: clean up arch_kexec_kernel_verify_sig" alone
>> >>>would fix the error. If I turn arch_kexec_apply_relocations{_add,} into
>> >
>> >Sorry I meant "alone won't fix the error".
>> >
>> >>>static function, the error would be gone. As attached is the patch would
>> >>>make this error disappear.
>> >>>
>> >>
>> >>Thank you! I can confirm the attached patch fixes the problem.
>> >>
>> >>
>> >>Veronika
>> >>
>> >>>However, s390 and x86 have its own implementation of
>> >>>arch_kexec_apply_relocations_add. This makes it looks like to be gcc's
>> >>>issue.
>> >
>> >Based on the above point and further investigation, I think the root cause is
>> >find_secsym_ndx in linux/scripts/recordmcount.h,
>> > /*
>> >  * Find a symbol in the given section, to be used as the base for relocating
>> >  * the table of offsets of calls to mcount.  A local or global symbol suffices,
>> >  * but avoid a Weak symbol because it may be overridden; the change in value
>> >  * would invalidate the relocations of the offsets of the calls to mcount.
>> >  * Often the found symbol will be the unnamed local symbol generated by
>> >  * GNU 'as' for the start of each section.  For example:
>> >  *    Num:    Value  Size Type    Bind   Vis      Ndx Name
>> >  *      2: 00000000     0 SECTION LOCAL  DEFAULT    1
>> >  */
>> > static int find_secsym_ndx(unsigned const txtndx,
>> >                               char const *const txtname,
>> >                               uint_t *const recvalp,
>> >                               unsigned int *sym_index,
>> >                               Elf_Shdr const *const symhdr,
>> >                               Elf32_Word const *symtab,
>> >                               Elf32_Word const *symtab_shndx,
>> >                               Elf_Ehdr const *const ehdr)
>> > {
>> >        ...
>> >               if (txtndx == get_symindex(symp, symtab, symtab_shndx)
>> >                       /* avoid STB_WEAK */
>> >
>> >        fprintf(stderr, "Cannot find symbol for section %u: %s.\n",
>> >               txtndx, txtname);
>> >
>> >This function prints the above warning after failing to find
>> >arch_kexec_kernel_verify_sig or arch_kexec_apply_relocations{_add,} in
>> >section 11: .text.unlikely. because it ignores the weak symbol and ppc64le
>> >doesn't its arch implementations of these functions. I'll see if I can fix
>> >it in linux/scripts/recordmcount.h.
>>
>> After digging deeper into linux/scripts/recordmcount.h, I think this
>> issue can be either fixed in the compiler or recordmcount. So I fild two bugs
>> - gcc: https://bugzilla.redhat.com/show_bug.cgi?id=2059838
>
>Hi,
>
>I have also opened a BZ for gcc some time ago and that is where I
>was redirected to this mailing list, linking it here if it helps:
>
>https://bugzilla.redhat.com/show_bug.cgi?id=2022470

Hi,

Thanks for the info. Sorry I didn't notice this bug. But I will use
bz2059838 since I already gave almost the decisive evidence showing
there is something wrong with Fedora's gcc in bz2059838. 

>
>
>Veronika
>
>> - linux/scripts/recordmcount.h: https://bugzilla.redhat.com/show_bug.cgi?id=2059842
>>
>> >
>> >>>
>> >>>
>> >>>>
>> >>>>Thanks
>> >>>>Baoquan
>> >>>>
>> >>>
>> >>>--
>> >>>Best regards,
>> >>>Coiby
>> >>
>> >
>> >--
>> >Best regards,
>> >Coiby
>>
>> --
>> Best regards,
>> Coiby
>>
>

-- 
Best regards,
Coiby




More information about the kexec mailing list