[REGRESSION][BISECTED] Cannot boot Lichee Pi 4A with FORTIFY_SOURCE enabled

Wed Oct 2 08:14:33 PDT 2024

On Tue, Oct 1, 2024 at 10:53 AM Alexandre Ghiti <alex at ghiti.fr> wrote:
>
> Hi Jason,
>
> On 25/09/2024 16:32, Jason Montleon wrote:
> > On Tue, Sep 24, 2024 at 1:36 PM Kees Cook <kees at kernel.org> wrote:
> >>
> >>
> >> On September 24, 2024 8:58:57 AM PDT, Jason Montleon <jmontleo at redhat.com> wrote:
> >>> On Sun, Sep 22, 2024 at 6:38 PM Kees Cook <kees at kernel.org> wrote:
> >>>> Can you try this patch? It should avoid using the "WARN" infrastructure
> >>>> (if that is the source of blocking boot), but should still provide some
> >>>> detail about what tripped it up (via the "regular" pr_*() logging). And
> >>>> if it boots, can you look at the log to find what it reports for the
> >>>> "Wanted to write ..." line(s)?
> >>>>
> >>>> Thanks!
> >>>>
> >>>> -Kees
> >>>>
> >>>>
> >>>> diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
> >>>> index 0d99bf11d260..469a3439959d 100644
> >>>> --- a/include/linux/fortify-string.h
> >>>> +++ b/include/linux/fortify-string.h
> >>>> @@ -549,7 +549,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>                                           const size_t q_size,
> >>>>                                           const size_t p_size_field,
> >>>>                                           const size_t q_size_field,
> >>>> -                                        const u8 func)
> >>>> +                                        const u8 func,
> >>>> +                                        const char *field)
> >>>>   {
> >>>>          if (__builtin_constant_p(size)) {
> >>>>                  /*
> >>>> @@ -610,8 +611,13 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>           * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832
> >>>>           */
> >>>>          if (p_size_field != SIZE_MAX &&
> >>>> -           p_size != p_size_field && p_size_field < size)
> >>>> -               return true;
> >>>> +           p_size != p_size_field && p_size_field < size) {
> >>>> +               if (p_size_field == 0)
> >>>> +                       pr_warn("Wanted to write %zu to a 0-sized destination: %s\n",
> >>>> +                               size, field);
> >>>> +               else
> >>>> +                       return true;
> >>>> +       }
> >>>>
> >>>>          return false;
> >>>>   }
> >>>> @@ -625,7 +631,8 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size,
> >>>>          const size_t __q_size_field = (q_size_field);                   \
> >>>>          fortify_warn_once(fortify_memcpy_chk(__fortify_size, __p_size,  \
> >>>>                                       __q_size, __p_size_field,          \
> >>>> -                                    __q_size_field, FORTIFY_FUNC_ ##op), \
> >>>> +                                    __q_size_field, FORTIFY_FUNC_ ##op,\
> >>>> +                                    #p " at " FILE_LINE), \
> >>>>                    #op ": detected field-spanning write (size %zu) of single %s (size %zu)\n", \
> >>>>                    __fortify_size,                                       \
> >>>>                    "field \"" #p "\" at " FILE_LINE,                     \
> >>>>
> >>>>
> >>>> --
> >>>> Kees Cook
> >>>>
> >>> Hi Kees,
> >>> Sorry for the delay in responding. Your patch also caused the system
> >>> to hang booting, but I took it as inspiration to try some things.
> >>> TL;DR I see these two print several times:
> >>>
> >>> Wanted to write 8 to a 0-sized destination: oldptr at
> >>> arch/riscv/errata/thead/errata.c:185
> >>> Wanted to write 28 to a 0-sized destination: oldptr at
> >>> arch/riscv/errata/thead/errata.c:185
> >>>
> >>> Since I suspected CONFIG_ERRATA_THEAD_MAE which relies on
> >>> CONFIG_RISCV_ALTERNATIVE_EARLY I suspected it is just too early for
> >>> pr_*, etc. I tried trace_printk but that didn't produce any traces,
> >>> and then I found a comment which makes me think we're too early for
> >>> that too.
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv/kernel/alternative.c?h=v6.11#n214
> >>>
> >>> I didn't want to give up so I wrote a simple module to figure out how
> >>> to use sbi_ecall and then incorporated that idea into your patch,
> >>> although even that was not straightforward:
> >>> https://gist.github.com/jmontleon/f3536ad70dc92bed992139ad5b266d5c
> >> Oh nice work! Probably something riscv-specific needs to be wired up to the printk routines to do what you have there until printk is fully available  But that's a separate issue, I guess.
> >>
> >>> Full output: https://gist.github.com/jmontleon/823facff38e28b717f153e38adcfd7fe
> >> Does the system boot with your patch?
> > Yes. It boots fully. It also booted with trace_printk, it just didn't
> > produce a trace, which again I think is expected at that stage. It
> > seems as long as there are no 'normal' print attempts, pr_()*, etc. it
> > works.
>
>
> Hmmm I don't see what in your patch would fix your issue.
>
> I sent a patch
> (https://lore.kernel.org/linux-riscv/20240829165048.49756-1-alexghiti@rivosinc.com/)
> that was merged in v6.11-rc7 which fixes something similar, could that
> be what fixed your problem?
>

Hi Alex, Thank you for having a look.

Although I narrowed it down to starting with 6.11-rc1, the first time
I encountered this was actually with an rc7 Fedora build.
http://riscv.rocks/koji/buildinfo?buildID=338433

I also did a build from upstream 6.11.1 last night and still see the
problem trying to boot with it this morning.

> The problem with this early code is that the MMU is not enabled yet and
> if your kernel is relocatable, the relocations are set "virtually" and
> then without the MMU, a call to an external function results in a
> trap...Which happens way too often, I have a few ideas to fix that, I
> need to find time to tackle this though.
>
> I don't know FORTIFY_SOURCE, but I guess this could cause a call to an
> external function and then cause the issue I describe above right? We
> already remove all instrumentations from this early code so maybe we
> should remove FORTIFY_SOURCE too?
>
What you are describing here does not seem unreasonable to me, but I
simply do not know enough to say one way or the other.
Is there something I can do to collect more information to determine
if this is the cause?

Thank you again for having a look,
Jason

> Thanks for digging into this,
>
> Alex
>
>
> >
> >> I would guess that this line at arch/riscv/errata/thead/errata.c:168:
> >>
> >> void *oldptr, *altptr;
> >>
> >> Should be:
> >>
> >> u8 *oldptr, *altptr;
> > I tried this, but it didn't appear to make a difference, I will try
> > some more today, though I admit to being out of my element.
> >
> >
> >
> >> But maybe __ALT_PTR needs adjustment too? Hmm. I would have expected void * to have a size of SIZE_MAX here, though, so something else might be confusing the check.
> >>
> >>
> >> --
> >> Kees Cook
> >>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>