syzkaller on risc-v

Dmitry Vyukov dvyukov at google.com
Tue Jun 30 08:48:31 EDT 2020


Hello risc-v maintainers,

Few days ago Tobias ported syzkaller (kernel fuzzer) to risc-v arch:
https://github.com/google/syzkaller/pull/1867
Tobias also provided nice instructions on how to run it using qemu+buildroot:
https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md
I tried to run it and it works. I wanted to write down some findings
in a public place. Some may be known, some not, some may be easier to
address, some maybe harder. For now my goal is just to document this.

1. KASAN does not seem to work.
I've tried both v5.8-rc2 and 1590a2e1c681b0991bd42c992cabfd380e0338f2
with/without KASAN and KCOV, both inline and outline and all
experiments point to broken KASAN. Boot gets to "INSTRUCTION SETS WANT
TO BE FREE" banner and then it hangs dead in secondary_start_common,
you may see some details here:
https://github.com/google/syzkaller/pull/1875#issuecomment-650545255
KASAN would be a prerequisite for testing risc-v on syzbot.
The recent KCOV patch works well, though.

2. I've also tried to convert our beefy syzbot config for x86_64, it
includes both lots of debug configs and subsystem configs:
https://github.com/google/syzkaller/blob/master/dashboard/config/upstream-kasan.config
I've passed it via olddefconfig for risc-v, disabled KASAN and tried
to boot and got a similar boot hang. I did not try to bisect the
config further.

3. Running with a small config (defconfig+KCOV) initially I got stack
overflows all over the place. Here are some samples:
https://gist.githubusercontent.com/dvyukov/0b6c7d93e2059f91241677a115c8e1ef/raw/947b7626f724262ba6fa3eb67b81f1a3f65cb419/gistfile1.txt
I ended up doing:

--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
-#define THREAD_SIZE_ORDER      (1)
+#define THREAD_SIZE_ORDER      (2)

This eliminated stack overflows.
KCOV may increase stack usage a bit, but not radically like KASAN. So
I would assume some stack overflows can happen without KCOV as well.
So either we need this, or at least bump stack size under KCOV.

4. In lots of cases I did not get meaningful stack traces.
E.g. WARNINGs don't unwind past the exception, which makes the stack useless:
https://gist.githubusercontent.com/dvyukov/717c748dd5cc20f2214026331467cd9f/raw/dd5da078a0bc0210ecf00bdee1112d610305189c/gistfile1.txt
This also happened a dozen of times for stack overflows:
https://gist.githubusercontent.com/dvyukov/6f58a866c8ba53343fd2142b1dfcfffa/raw/1ac463c5924fa53fbe99fd8a4e093af3e3429c0f/gistfile1.txt
also rcu stalls did not get stacks past the timer interrupt:
https://gist.githubusercontent.com/dvyukov/bbad28c67d55fb4e12936da13c533cf5/raw/fb41b4805238fed753b39641d6c7e496519f7f56/gistfile1.txt
and various kinds of exceptions did not get any meaningful stack traces:
https://gist.githubusercontent.com/dvyukov/59fa9ef0f8e1f780c75a2f561b1efd24/raw/91e1f60c23992e6985fc155c2cfb081a30da7662/gistfile1.txt
This makes it hard to debug, but stack traces are also required by
proper bug bucketing by syzkaller.

5. Once we have proper stack traces, we will need to extend syzkaller
test case base to include samples of risc-v crashes:
https://github.com/google/syzkaller/tree/master/pkg/report/testdata/linux/report
and crash parsing code to properly understand and bucket these crashes:
https://github.com/google/syzkaller/blob/master/pkg/report/linux.go#L914-L1685

6. I observed lots of what looks like user-space process memory
corruptions. There included thousands of panics in our Go programs
with things that I would consider "impossible", at least they did not
come up before in our syzbot fuzzing. Also some Go runtime
"impossible" crashes, e.g.:
https://gist.githubusercontent.com/dvyukov/fb489ed93f7180621c71714ee07e53dc/raw/a7d2e98a56da17af2aec79c164cd3a8e154ecf5c/gistfile1.txt
Maybe it's a known issue? Should we use tip instead of 1.14? Is it more stable?
Though it's not necessary Go b/c kernel contains hundreds of memory
corruptions and we observed kernel corrupting user-space processes
routinely. This is especially true without KASAN because kernel
corruptions are not caught early. However, the ratio and nature of
crashes makes me suspect some issue in Go risc-v runtime.

Thanks



More information about the linux-riscv mailing list