[REGRESSION][BISECTED] Cannot boot Lichee Pi 4A with FORTIFY_SOURCE enabled

Jason Montleon jmontleo at redhat.com
Thu Oct 3 10:12:59 PDT 2024


On Thu, Oct 3, 2024 at 10:41 AM Alexandre Ghiti <alex at ghiti.fr> wrote:
>
> Hi Jason,
>
> On 02/10/2024 17:14, Jason Montleon wrote:
> > On Tue, Oct 1, 2024 at 10:53 AM Alexandre Ghiti <alex at ghiti.fr> wrote:
> >> Hi Jason,
> >>
> >> On 25/09/2024 16:32, Jason Montleon wrote:
> >>> On Tue, Sep 24, 2024 at 1:36 PM Kees Cook <kees at kernel.org> wrote:
> >>>>
> >>>> On September 24, 2024 8:58:57 AM PDT, Jason Montleon <jmontleo at redhat.com> wrote:
> >>>>> On Sun, Sep 22, 2024 at 6:38 PM Kees Cook <kees at kernel.org> wrote:
> >>>>>> Can you try this patch? It should avoid using the "WARN" infrastructure
> >>>>>> (if that is the source of blocking boot), but should still provide some
> >>>>>> detail about what tripped it up (via the "regular" pr_*() logging). And
> >>>>>> if it boots, can you look at the log to find what it reports for the
> >>>>>> "Wanted to write ..." line(s)?
> >>>>>>
> >>>>>> Thanks!
> >>>>>>
> >>>>>> -Kees
> >>>>>>
> >>>>>>
> >>>>>> diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
> >>>>>> index 0d99bf11d260..469a3439959d 100644
> >>>>>> --- a/include/linux/fortify-string.h
> >>>>>> +++ b/include/linux/fortify-string.h
> >>>>>> @@ -549,7 +549,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>>>                                            const size_t q_size,
> >>>>>>                                            const size_t p_size_field,
> >>>>>>                                            const size_t q_size_field,
> >>>>>> -                                        const u8 func)
> >>>>>> +                                        const u8 func,
> >>>>>> +                                        const char *field)
> >>>>>>    {
> >>>>>>           if (__builtin_constant_p(size)) {
> >>>>>>                   /*
> >>>>>> @@ -610,8 +611,13 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>>>            * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
> >>>>>>            */
> >>>>>>           if (p_size_field != SIZE_MAX &&
> >>>>>> -           p_size != p_size_field && p_size_field < size)
> >>>>>> -               return true;
> >>>>>> +           p_size != p_size_field && p_size_field < size) {
> >>>>>> +               if (p_size_field == 0)
> >>>>>> +                       pr_warn("Wanted to write %zu to a 0-sized destination: %s\n",
> >>>>>> +                               size, field);
> >>>>>> +               else
> >>>>>> +                       return true;
> >>>>>> +       }
> >>>>>>
> >>>>>>           return false;
> >>>>>>    }
> >>>>>> @@ -625,7 +631,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>>>           const size_t __q_size_field = (q_size_field);                   \
> >>>>>>           fortify_warn_once(fortify_memcpy_chk(__fortify_size, __p_size,  \
> >>>>>>                                        __q_size, __p_size_field,          \
> >>>>>> -                                    __q_size_field, FORTIFY_FUNC_ ##op), \
> >>>>>> +                                    __q_size_field, FORTIFY_FUNC_ ##op,\
> >>>>>> +                                    #p " at " FILE_LINE), \
> >>>>>>                     #op ": detected field-spanning write (size %zu) of single %s (size %zu)\n", \
> >>>>>>                     __fortify_size,                                       \
> >>>>>>                     "field \"" #p "\" at " FILE_LINE,                     \
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Kees Cook
> >>>>>>
> >>>>> Hi Kees,
> >>>>> Sorry for the delay in responding. Your patch also caused the system
> >>>>> to hang booting, but I took it as inspiration to try some things.
> >>>>> TL;DR I see these two print several times:
> >>>>>
> >>>>> Wanted to write 8 to a 0-sized destination: oldptr at
> >>>>> arch/riscv/errata/thead/errata.c:185
> >>>>> Wanted to write 28 to a 0-sized destination: oldptr at
> >>>>> arch/riscv/errata/thead/errata.c:185
> >>>>>
> >>>>> Since I suspected CONFIG_ERRATA_THEAD_MAE which relies on
> >>>>> CONFIG_RISCV_ALTERNATIVE_EARLY I suspected it is just too early for
> >>>>> pr_*, etc. I tried trace_printk but that didn't produce any traces,
> >>>>> and then I found a comment which makes me think we're too early for
> >>>>> that too.
> >>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/kernel/alternative.c?h=v6.11#n214
> >>>>>
> >>>>> I didn't want to give up so I wrote a simple module to figure out how
> >>>>> to use sbi_ecall and then incorporated that idea into your patch,
> >>>>> although even that was not straightforward:
> >>>>> https://gist.github.com/jmontleon/f3536ad70dc92bed992139ad5b266d5c
> >>>> Oh nice work! Probably something riscv-specific needs to be wired up to the printk routines to do what you have there until printk is fully available  But that's a separate issue, I guess.
> >>>>
> >>>>> Full output: https://gist.github.com/jmontleon/823facff38e28b717f153e38adcfd7fe
> >>>> Does the system boot with your patch?
> >>> Yes. It boots fully. It also booted with trace_printk, it just didn't
> >>> produce a trace, which again I think is expected at that stage. It
> >>> seems as long as there are no 'normal' print attempts, pr_()*, etc. it
> >>> works.
> >>
> >> Hmmm I don't see what in your patch would fix your issue.
> >>
> >> I sent a patch
> >> (https://lore.kernel.org/linux-riscv/20240829165048.49756-1-alexghiti@rivosinc.com/)
> >> that was merged in v6.11-rc7 which fixes something similar, could that
> >> be what fixed your problem?
> >>
> > Hi Alex, Thank you for having a look.
> >
> > Although I narrowed it down to starting with 6.11-rc1, the first time
> > I encountered this was actually with an rc7 Fedora build.
> > http://riscv.rocks/koji/buildinfo?buildID=338433
> >
> > I also did a build from upstream 6.11.1 last night and still see the
> > problem trying to boot with it this morning.
> >
> >> The problem with this early code is that the MMU is not enabled yet and
> >> if your kernel is relocatable, the relocations are set "virtually" and
> >> then without the MMU, a call to an external function results in a
> >> trap...Which happens way too often, I have a few ideas to fix that, I
> >> need to find time to tackle this though.
> >>
> >> I don't know FORTIFY_SOURCE, but I guess this could cause a call to an
> >> external function and then cause the issue I describe above right? We
> >> already remove all instrumentations from this early code so maybe we
> >> should remove FORTIFY_SOURCE too?
>
>
> So I was able to reproduce the issue on qemu by adding a few tweaks, and
> indeed we trap in __warn_printk() on a virtual address but MMU is not
> enabled yet.
>
> The following diff though allows me to pass this failure but I can't get
> much further in the boot since the tweaks I added won't allow it, can
> you give the following a try?
>
> diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile
> index 8a27394851233..4913f3b3f198c 100644
> --- a/arch/riscv/errata/Makefile
> +++ b/arch/riscv/errata/Makefile
> @@ -2,6 +2,10 @@ ifdef CONFIG_RELOCATABLE
>   KBUILD_CFLAGS += -fno-pie
>   endif
>
> +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY
> +KBUILD_CFLAGS += -D__NO_FORTIFY
> +endif
> +
>   obj-$(CONFIG_ERRATA_ANDES) += andes/
>   obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
>   obj-$(CONFIG_ERRATA_THEAD) += thead/

Yes, this worked.

Thank you,
Jason

> Thanks,
>
> Alex
>
>
> >>
> > What you are describing here does not seem unreasonable to me, but I
> > simply do not know enough to say one way or the other.
> > Is there something I can do to collect more information to determine
> > if this is the cause?
> >
> > Thank you again for having a look,
> > Jason
> >
> >> Thanks for digging into this,
> >>
> >> Alex
> >>
> >>
> >>>> I would guess that this line at arch/riscv/errata/thead/errata.c:168:
> >>>>
> >>>> void *oldptr, *altptr;
> >>>>
> >>>> Should be:
> >>>>
> >>>> u8 *oldptr, *altptr;
> >>> I tried this, but it didn't appear to make a difference, I will try
> >>> some more today, though I admit to being out of my element.
> >>>
> >>>
> >>>
> >>>> But maybe __ALT_PTR needs adjustment too? Hmm. I would have expected void * to have a size of SIZE_MAX here, though, so something else might be confusing the check.
> >>>>
> >>>>
> >>>> --
> >>>> Kees Cook
> >>>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv at lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>




More information about the linux-riscv mailing list