[PATCH] arm: port KCOV to arm

Fri Apr 27 09:33:43 PDT 2018

On Fri, Apr 27, 2018 at 06:21:53PM +0200, 'Dmitry Vyukov' via syzkaller wrote:
> On Fri, Apr 27, 2018 at 6:18 PM, Mark Rutland <mark.rutland at arm.com> wrote:
> > On Fri, Apr 27, 2018 at 03:51:22PM +0200, Dmitry Vyukov wrote:
> >> On Fri, Apr 27, 2018 at 3:06 PM, Mark Rutland <mark.rutland at arm.com> wrote:
> >> > Can you share your kernel config?
> >>
> >> Attached. It's pretty much vexpress_defconfig with few minor
> >> additions. Here is full description of what I am doing:
> >> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md
> >
> > Cheers!
> >
> >> FWIW when I do "KCOV_INSTRUMENT_fault.o := n" everything works and I
> >> see reasonable coverage.
> >
> > While this may be the case, I think it's papering over a bug rather than
> > solving it.
> >
> > [...]
> >
> >> > I can't reproduce the issue on real hardware atop of v4.17-rc2, when
> >> > booting and running a standard ARMv7 buildroot userspace. So the kcov
> >> > mode check seems fine to me.
> >>
> >> It happens after brief fuzzing with syzkaller. So it's both kcov
> >> opened and some weird syscall workload. Again, here is everything what
> >> I am doing:
> >> https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_arm-kernel.md
> >
> > I've set this up, and while I see RCU stalls and "no output from test
> > machine" warnings, I'm not seeing any reports with KCOV splats.
> >
> > Are you somehow connecting to a VM which failed with no output?
> 
> I've started seeing assorted crashes like these:
> 
> kernel panic: Fatal exception
> unable to handle kernel paging request in migrate_task_rq_fair
> BUG: spinlock bad magic in corrupted
> unable to handle kernel paging request in trace_hardirqs_off_caller
> unable to handle kernel paging request in kick_process
> kernel panic: stack-protector: Kernel stack is corrupted in: do_futex
> unable to handle kernel paging request in __sanitizer_cov_trace_pc
> Unable to handle kernel paging request at virtual address ADDR

Just to check, is that with or without instrumentation in fault.c?

It might be worth enabling HARDENED_USERCOPY -- that should scream if we
corrupt task_struct via a uaccess.

> Do you see code coverage increasing?

Not so far. QEMU TCG on this machine is rather slow, so it might just be
that VMs are timing out at boot time.

> Besides compiler I am not sure what else can be different between our
> setups (mine is Debian's 7.2).

Could you give mine [1] a go? It's the Linaro 17.11
arm-linux-gnueabihf-gcc 7.2.1 toolchain.

I don't ahve a Debian 7 install up at the moment.

[1] https://releases.linaro.org/components/toolchain/binaries/latest/arm-linux-gnueabihf/gcc-linaro-7.2.1-2017.11-x86_64_arm-linux-gnueabihf.tar.xz

Thanks,
Mark.