[PATCH -fixes v3 0/6] Fixes KASAN and other along the way

Aleksandr Nogikh nogikh at google.com
Thu Mar 24 09:53:30 PDT 2022


https://pastebin.com/pN4rUjSi))))On Thu, Mar 10, 2022 at 9:42 AM
Alexandre Ghiti <alexandre.ghiti at canonical.com> wrote:
>
> Hi,
>
> On Wed, Mar 9, 2022 at 11:52 AM Dmitry Vyukov <dvyukov at google.com> wrote:
> >
> > On Wed, 9 Mar 2022 at 11:45, Aleksandr Nogikh <nogikh at google.com> wrote:
> > >
> > > I switched the riscv syzbot instance to KASAN_OUTLINE and now it is
> > > finally being fuzzed again!
> > >
> > > Thank you very much for the series!
> >
> >
> > But all riscv crashes are still classified as "corrupted" and thrown
> > away (not reported):
> > https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
> >
> > The problem is that risvc oopses don't contain "Call Trace:" in the
> > beginning of stack traces, so it's hard to make sense out of them.
> > arch/riscv seems to print "Call Trace:" in a wrong function, not where
> > all other arches print it.
> >
>
> Does the following diff fix this issue?
>
> diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
> index 201ee206fb57..348ca19ccbf8 100644
> --- a/arch/riscv/kernel/stacktrace.c
> +++ b/arch/riscv/kernel/stacktrace.c
> @@ -109,12 +109,12 @@ static bool print_trace_address(void *arg,
> unsigned long pc)
>  noinline void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
>                     const char *loglvl)
>  {
> +       pr_cont("%sCall Trace:\n", loglvl);
>         walk_stackframe(task, regs, print_trace_address, (void *)loglvl);
>  }
>
>  void show_stack(struct task_struct *task, unsigned long *sp, const
> char *loglvl)
>  {
> -       pr_cont("%sCall Trace:\n", loglvl);
>         dump_backtrace(NULL, task, loglvl);
>  }
>
> Thanks,
>
> Alex

I wouldn't say that all riscv crashes are ending up in the "corrupted
report" bucket, but for some classes of errors there are definitely
differences from other architectures and they prevent syzkaller from
making sense out of those reports. At the moment everything seems to
be working fine at least with "WARNING:", "KASAN:" and "kernel
panic:".

I've run syzkaller with and without the small patch. From what I
observed, it definitely helps with the "BUG: soft lockup in" class of
reports. Previously they were declared corrupted, now syzkaller parses
them normally.

There's still a problem with "INFO: rcu_preempt detected stalls on
CPUs/tasks", which might be a bit more complicated than just the Call
Trace printing location.

Here's an example of such a report from x86: https://pastebin.com/KMEE5YRf
There goes a header with the  "rcu: INFO: rcu_preempt detected stalls
on CPUs/tasks:" title
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L520),
then backtrace for one CPU
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L331),
then there goes another error message about starving kthread
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L442),
then there go two kthread-related traces.

And here's a report from riscv: https://pastebin.com/pN4rUjSi
There's de facto no backtrace between "rcu: INFO: rcu_preempt detected
stalls on CPUs/tasks:" and "rcu: RCU grace-period kthread stack
dump:".


>
> >
> >
> > > --
> > > Best Regards,
> > > Aleksandr
> > >
> > > On Fri, Mar 4, 2022 at 5:12 AM Palmer Dabbelt <palmer at dabbelt.com> wrote:
> > > >
> > > > On Tue, 01 Mar 2022 09:39:54 PST (-0800), Palmer Dabbelt wrote:
> > > > > On Fri, 25 Feb 2022 07:00:23 PST (-0800), glider at google.com wrote:
> > > > >> On Fri, Feb 25, 2022 at 3:47 PM Alexandre Ghiti <
> > > > >> alexandre.ghiti at canonical.com> wrote:
> > > > >>
> > > > >>> On Fri, Feb 25, 2022 at 3:31 PM Alexander Potapenko <glider at google.com>
> > > > >>> wrote:
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > On Fri, Feb 25, 2022 at 3:15 PM Alexandre Ghiti <
> > > > >>> alexandre.ghiti at canonical.com> wrote:
> > > > >>> >>
> > > > >>> >> On Fri, Feb 25, 2022 at 3:10 PM Alexander Potapenko <glider at google.com>
> > > > >>> wrote:
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> > On Fri, Feb 25, 2022 at 3:04 PM Alexandre Ghiti <
> > > > >>> alexandre.ghiti at canonical.com> wrote:
> > > > >>> >> >>
> > > > >>> >> >> On Fri, Feb 25, 2022 at 2:06 PM Marco Elver <elver at google.com>
> > > > >>> wrote:
> > > > >>> >> >> >
> > > > >>> >> >> > On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
> > > > >>> >> >> > <alexandre.ghiti at canonical.com> wrote:
> > > > >>> >> >> > >
> > > > >>> >> >> > > As reported by Aleksandr, syzbot riscv is broken since commit
> > > > >>> >> >> > > 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit
> > > > >>> actually
> > > > >>> >> >> > > breaks KASAN_INLINE which is not fixed in this series, that will
> > > > >>> come later
> > > > >>> >> >> > > when found.
> > > > >>> >> >> > >
> > > > >>> >> >> > > Nevertheless, this series fixes small things that made the syzbot
> > > > >>> >> >> > > configuration + KASAN_OUTLINE fail to boot.
> > > > >>> >> >> > >
> > > > >>> >> >> > > Note that even though the config at [1] boots fine with this
> > > > >>> series, I
> > > > >>> >> >> > > was not able to boot the small config at [2] which fails because
> > > > >>> >> >> > > kasan_poison receives a really weird address 0x4075706301000000
> > > > >>> (maybe a
> > > > >>> >> >> > > kasan person could provide some hint about what happens below in
> > > > >>> >> >> > > do_ctors -> __asan_register_globals):
> > > > >>> >> >> >
> > > > >>> >> >> > asan_register_globals is responsible for poisoning redzones around
> > > > >>> >> >> > globals. As hinted by 'do_ctors', it calls constructors, and in
> > > > >>> this
> > > > >>> >> >> > case a compiler-generated constructor that calls
> > > > >>> >> >> > __asan_register_globals with metadata generated by the compiler.
> > > > >>> That
> > > > >>> >> >> > metadata contains information about global variables. Note, these
> > > > >>> >> >> > constructors are called on initial boot, but also every time a
> > > > >>> kernel
> > > > >>> >> >> > module (that has globals) is loaded.
> > > > >>> >> >> >
> > > > >>> >> >> > It may also be a toolchain issue, but it's hard to say. If you're
> > > > >>> >> >> > using GCC to test, try Clang (11 or later), and vice-versa.
> > > > >>> >> >>
> > > > >>> >> >> I tried 3 different gcc toolchains already, but that did not fix the
> > > > >>> >> >> issue. The only thing that worked was setting asan-globals=0 in
> > > > >>> >> >> scripts/Makefile.kasan, but ok, that's not a fix.
> > > > >>> >> >> I tried to bisect this issue but our kasan implementation has been
> > > > >>> >> >> broken quite a few times, so it failed.
> > > > >>> >> >>
> > > > >>> >> >> I keep digging!
> > > > >>> >> >>
> > > > >>> >> >
> > > > >>> >> > The problem does not reproduce for me with GCC 11.2.0: kernels built
> > > > >>> with both [1] and [2] are bootable.
> > > > >>> >>
> > > > >>> >> Do you mean you reach userspace? Because my image boots too, and fails
> > > > >>> >> at some point:
> > > > >>> >>
> > > > >>> >> [    0.000150] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
> > > > >>> >> every 4398046511100ns
> > > > >>> >> [    0.015847] Console: colour dummy device 80x25
> > > > >>> >> [    0.016899] printk: console [tty0] enabled
> > > > >>> >> [    0.020326] printk: bootconsole [ns16550a0] disabled
> > > > >>> >>
> > > > >>> >
> > > > >>> > In my case, QEMU successfully boots to the login prompt.
> > > > >>> > I am running QEMU 6.2.0 (Debian 1:6.2+dfsg-2) and an image Aleksandr
> > > > >>> shared with me (guess it was built according to this instruction:
> > > > >>> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md
> > > > >>> )
> > > > >>> >
> > > > >>>
> > > > >>> Nice thanks guys! I always use the latest opensbi and not the one that
> > > > >>> is embedded in qemu, which is the only difference between your command
> > > > >>> line (which works) and mine (which does not work). So the issue is
> > > > >>> probably there, I really need to investigate that now.
> > > > >>>
> > > > >>> Great to hear that!
> > > > >>
> > > > >>
> > > > >>> That means I only need to fix KASAN_INLINE and we're good.
> > > > >>>
> > > > >>> I imagine Palmer can add your Tested-by on the series then?
> > > > >>>
> > > > >> Sure :)
> > > > >
> > > > > Do you mind actually posting that (i, the Tested-by tag)?  It's less
> > > > > likely to get lost that way.  I intend on taking this into fixes ASAP,
> > > > > my builds have blown up for some reason (I got bounced between machines,
> > > > > so I'm blaming that) so I need to fix that first.
> > > >
> > > > This is on fixes (with a "Tested-by: Alexander Potapenko
> > > > <glider at google.com>"), along with some trivial commit message fixes.
> > > >
> > > > Thanks!
> > > >
> > > > >
> > > > >>
> > > > >>>
> > > > >>> Thanks again!
> > > > >>>
> > > > >>> Alex
> > > > >>>
> > > > >>> >>
> > > > >>> >> It traps here.
> > > > >>> >>
> > > > >>> >> > FWIW here is how I run them:
> > > > >>> >> >
> > > > >>> >> > qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot \
> > > > >>> >> >   -device virtio-rng-pci -machine virt -device \
> > > > >>> >> >   virtio-net-pci,netdev=net0 -netdev \
> > > > >>> >> >   user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device \
> > > > >>> >> >   virtio-blk-device,drive=hd0 -drive \
> > > > >>> >> >   file=${IMAGE},if=none,format=raw,id=hd0 -snapshot \
> > > > >>> >> >   -kernel ${KERNEL_SRC_DIR}/arch/riscv/boot/Image -append
> > > > >>> "root=/dev/vda
> > > > >>> >> >   console=ttyS0 earlyprintk=serial"
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> >>
> > > > >>> >> >> Thanks for the tips,
> > > > >>> >> >>
> > > > >>> >> >> Alex
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> > --
> > > > >>> >> > Alexander Potapenko
> > > > >>> >> > Software Engineer
> > > > >>> >> >
> > > > >>> >> > Google Germany GmbH
> > > > >>> >> > Erika-Mann-Straße, 33
> > > > >>> >> > 80636 München
> > > > >>> >> >
> > > > >>> >> > Geschäftsführer: Paul Manicle, Liana Sebastian
> > > > >>> >> > Registergericht und -nummer: Hamburg, HRB 86891
> > > > >>> >> > Sitz der Gesellschaft: Hamburg
> > > > >>> >> >
> > > > >>> >> > Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
> > > > >>> erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
> > > > >>> weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte
> > > > >>> wissen, dass die E-Mail an die falsche Person gesendet wurde.
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> >
> > > > >>> >> > This e-mail is confidential. If you received this communication by
> > > > >>> mistake, please don't forward it to anyone else, please erase all copies
> > > > >>> and attachments, and please let me know that it has gone to the wrong
> > > > >>> person.
> > > > >>> >>
> > > > >>> >> --
> > > > >>> >> You received this message because you are subscribed to the Google
> > > > >>> Groups "kasan-dev" group.
> > > > >>> >> To unsubscribe from this group and stop receiving emails from it, send
> > > > >>> an email to kasan-dev+unsubscribe at googlegroups.com.
> > > > >>> >> To view this discussion on the web visit
> > > > >>> https://groups.google.com/d/msgid/kasan-dev/CA%2BzEjCsQPVYSV7CdhKnvjujXkMXuRQd%3DVPok1awb20xifYmidw%40mail.gmail.com
> > > > >>> .
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > --
> > > > >>> > Alexander Potapenko
> > > > >>> > Software Engineer
> > > > >>> >
> > > > >>> > Google Germany GmbH
> > > > >>> > Erika-Mann-Straße, 33
> > > > >>> > 80636 München
> > > > >>> >
> > > > >>> > Geschäftsführer: Paul Manicle, Liana Sebastian
> > > > >>> > Registergericht und -nummer: Hamburg, HRB 86891
> > > > >>> > Sitz der Gesellschaft: Hamburg
> > > > >>> >
> > > > >>> > Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten
> > > > >>> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
> > > > >>> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
> > > > >>> dass die E-Mail an die falsche Person gesendet wurde.
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > This e-mail is confidential. If you received this communication by
> > > > >>> mistake, please don't forward it to anyone else, please erase all copies
> > > > >>> and attachments, and please let me know that it has gone to the wrong
> > > > >>> person.
> > > > >>>
> > > > >>> --
> > > > >>> You received this message because you are subscribed to the Google Groups
> > > > >>> "kasan-dev" group.
> > > > >>> To unsubscribe from this group and stop receiving emails from it, send an
> > > > >>> email to kasan-dev+unsubscribe at googlegroups.com.
> > > > >>> To view this discussion on the web visit
> > > > >>> https://groups.google.com/d/msgid/kasan-dev/CA%2BzEjCuJw8N0dUmQNdFqDM96bzKqPDjRe4FUnOCbjhJtO0R8Hg%40mail.gmail.com
> > > > >>> .
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Alexander Potapenko
> > > > >> Software Engineer
> > > > >>
> > > > >> Google Germany GmbH
> > > > >> Erika-Mann-Straße, 33
> > > > >> 80636 München
> > > > >>
> > > > >> Geschäftsführer: Paul Manicle, Liana Sebastian
> > > > >> Registergericht und -nummer: Hamburg, HRB 86891
> > > > >> Sitz der Gesellschaft: Hamburg
> > > > >>
> > > > >> Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten
> > > > >> haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter,
> > > > >> löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen,
> > > > >> dass die E-Mail an die falsche Person gesendet wurde.
> > > > >>
> > > > >>
> > > > >>
> > > > >> This e-mail is confidential. If you received this communication by mistake,
> > > > >> please don't forward it to anyone else, please erase all copies and
> > > > >> attachments, and please let me know that it has gone to the wrong person.



More information about the linux-riscv mailing list