Arm + KASAN + syzbot

Mark Rutland mark.rutland at arm.com
Tue Jan 19 08:00:10 EST 2021


On Tue, Jan 19, 2021 at 11:34:33AM +0100, 'Dmitry Vyukov' via syzkaller wrote:
> On Tue, Jan 19, 2021 at 11:04 AM Mark Rutland <mark.rutland at arm.com> wrote:
> > On Mon, Jan 18, 2021 at 05:31:36PM +0100, 'Dmitry Vyukov' via syzkaller wrote:
> > It might be best to use `-machine virt` here instead; that way QEMU
> > won't need to emulate any of the real vexpress HW, and the kernel won't
> > need to waste any time poking it.
> 
> Hi Mark,
> 
> The whole point of setting up an Arm instance is getting as much
> coverage we can't get on x86_64 instances as possible. The instance
> will use qemu emulation (extremely slow) and limited capacity.
> I see some drivers and associated hardware support as one of the main
> such areas. That's why I tried to use vexpress-a15. And it boots
> without KASAN, so presumably it can be used in general.

Fair enough.

I had assumed that your first aim would to cover the arch code shared
across all arm platforms, to flush out any big/common problems first,
for which the virt platform is a good start, and has worked quite well
for arm64.

[...]

> > > 3. CONFIG_KCOV does not seem to fully work.
> > > It seems to work except for when the kernel crashes, and that's the
> > > most interesting scenario for us. When the kernel crashes for other
> > > reasons, crash handlers re-crashe in KCOV making all crashes
> > > unactionable and indistinguishable.
> > > Here are some samples (search for __sanitizer_cov_trace):
> > > https://gist.githubusercontent.com/dvyukov/c8a7ff1c00a5223c5143fd90073f5bc4/raw/c0f4ac7fd7faad7253843584fed8620ac6006338/gistfile1.txt
> >
> > Most of those are all small offsets from 0, which suggests an offset is
> > being added to a NULL pointer somewhere, which I suspect means
> > task_struct::kcov_area is NULL. We could hack-in a check for that, and
> > see if that's the case (though I can't see how from a quick scan of the
> > kcov code).
> 
> My first guess would be is that current itself if NULL.

I think if that were to happen (which'd imply corruption of thread_info)
the fault handling and logging would also blow up, so I suspect this
isn't the case. 

Do you have a reelvant vmlinux to hand? With that we could figure out
which access is faulting, how the address is being generated, and where
the bogus address is coming from, without having to guess. :)

> Accesses to current->kcov* are well tested on other arches, including
> using KCOV in interrupts, etc.

While that's generally true, architectures differ in a number of ways
that can affect this (e.g. how the vmalloc area is faulted, what
precisely is preemptible/interruptible), and we had to make preparatory
changes to make KCOV work on arm even though it was working perfectly
fine on arm64 and x86_64, e.g.

* c9484b986ef03492 ("kcov: ensure irq code sees a valid area")
* dc55daff9040a90a ("kcov: prefault the kcov_area")
* 0ed557aa813922f6 ("sched/core / kcov: avoid kcov_area during task switch")

... so I don't think we can rule out the possibility of a latent issue
here, even if we haven't triggered it elsewhere.

Thanks,
Mark.



More information about the linux-arm-kernel mailing list