Crash on armv7-a using KASAN

Linus Walleij linus.walleij at linaro.org
Tue Oct 15 06:51:02 PDT 2024


On Tue, Oct 15, 2024 at 12:28 PM Mark Rutland <mark.rutland at arm.com> wrote:
> On Mon, Oct 14, 2024 at 03:19:49PM +0200, Clement LE GOFFIC wrote:

> I think what's happening here is that when switching from prev to next
> in the scheduler, we switch to next's mm before we actually switch to
> next's register state, and there's a transient window where prev is
> executed using next's mm. AFAICT we don't map prev's KASAN stack shadow
> into next's mm anywhere, and so inlined KASAN_STACK checks recursively
> fault on this until we switch to the overflow stack.

Oh my, that's pretty advanced. Well spotted!
So it has nothing to do with Ards commit, correlation does not
imply causation.

> More details on that below.
>
> Linus, are you able to look into this?

Of course, I'm trying to reproduce the bug.

> >  __dabt_svc from do_translation_fault+0x30/0x2b0
> >  do_translation_fault from do_DataAbort+0x74/0x1dc
> >  do_DataAbort from __dabt_svc+0x4c/0x80
> > Exception stack(0xac003ad8 to 0xac003b20)
> > 3ac0:                                                       ac003bc8
> > 00000005
> > 3ae0: ac003b88 74800779 7480078f ac003b88 7480078f ac003b88 00000005
> > 82412640
> > 3b00: ac003d20 ac003d54 00000051 ac003b28 80125c14 80125920 200f0193
> > ffffffff
> >  __dabt_svc from do_translation_fault+0x30/0x2b0
> >  do_translation_fault from do_DataAbort+0x74/0x1dc
> >  do_DataAbort from __dabt_svc+0x4c/0x80
> > Exception stack(0xac003b88 to 0xac003bd0)
> > 3b80:                   ac003c78 00000805 ac003c38 7480078f 74800798
> > ac003c38
> > 3ba0: 74800798 ac003c38 00000805 82412640 ac003d20 ac003d54 00000051
> > ac003bd8
> > 3bc0: 80125c14 80125920 200f0193 ffffffff
> >  __dabt_svc from do_translation_fault+0x30/0x2b0
> >  do_translation_fault from do_DataAbort+0x74/0x1dc
> >  do_DataAbort from __dabt_svc+0x4c/0x80
>
> The above frames are the same; whatever the kernel is accessing at
> do_translation_fault+0x30 is causing this to go recursive...
>
> I can reproduce this, pretty easily, with a similar enough trace, though
> faddr2line isn't happy to give me a line number.

Did you reproduce it the same way with a few find /?

I am trying to reproduce it and failing :/
(Using Torvald's HEAD)

This is my config:

CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_WORKING_NOSANITIZE_ADDRESS=y
CONFIG_KASAN=y
CONFIG_CC_HAS_KASAN_MEMINTRINSIC_PREFIX=y
CONFIG_KASAN_GENERIC=y
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
# CONFIG_KASAN_STACK is not set
# CONFIG_KASAN_VMALLOC is not set
# CONFIG_KASAN_EXTRA_INFO is not set

Do you use more KASAN?

Then I run:

${QEMU} -M vexpress-a15 -m 512M -no-reboot -smp cpus=2 -kernel
${ZIMAGE} -dtb ${DTB} -append "root=/dev/mmcblk0 rw roottype=ext4
console=ttyAMA0" -serial stdio -drive
if=sd,driver=raw,cache=writeback,file=./arch_rootfs.ext4

This is a rootfs with Debian.

Then I fork a few find /|grep fnord > /dev/null &

root at vexpress:~# find / |grep fnord > /dev/null &
[1] 554
root at vexpress:~# find / |grep fnord > /dev/null &
[2] 556
root at vexpress:~# find / |grep fnord > /dev/null &
[3] 558
root at vexpress:~# find / |grep fnord > /dev/null &
[4] 560
root at vexpress:~# find / |grep fnord > /dev/null &
[5] 562
root at vexpress:~# find / |grep fnord > /dev/null &
[6] 564
root at vexpress:~# find / |grep fnord > /dev/null &
[7] 566
root at vexpress:~# find / |grep fnord > /dev/null &
[8] 568
root at vexpress:~# find / |grep fnord > /dev/null &
[9] 570
root at vexpress:~# find / |grep fnord > /dev/null &
[10] 572
root at vexpress:~# find / |grep fnord > /dev/null &
[11] 574
root at vexpress:~# find / |grep fnord > /dev/null &
[12] 576
root at vexpress:~# find / |grep fnord > /dev/null &
[13] 578
root at vexpress:~# find / |grep fnord > /dev/null &
[14] 580
root at vexpress:~# find / |grep fnord > /dev/null &
[15] 582
root at vexpress:~# find / |grep fnord > /dev/null &
[16] 584
root at vexpress:~# find / |grep fnord > /dev/null &
[17] 586
root at vexpress:~# find / |grep fnord > /dev/null &
^[[A[18] 588
root at vexpress:~# find / |grep fnord > /dev/null &
^[[A[19] 590
root at vexpress:~# find / |grep fnord > /dev/null &
[20] 592
root at vexpress:~#
root at vexpress:~# ps
  PID TTY          TIME CMD
  291 ttyAMA0  00:00:02 login
  550 ttyAMA0  00:00:01 bash
  553 ttyAMA0  00:00:06 find
  554 ttyAMA0  00:00:00 grep
  555 ttyAMA0  00:00:04 find
  556 ttyAMA0  00:00:00 grep
  557 ttyAMA0  00:00:03 find
  558 ttyAMA0  00:00:00 grep
  559 ttyAMA0  00:00:03 find
  560 ttyAMA0  00:00:00 grep
  561 ttyAMA0  00:00:03 find
  562 ttyAMA0  00:00:00 grep
  563 ttyAMA0  00:00:02 find
  564 ttyAMA0  00:00:00 grep
  565 ttyAMA0  00:00:02 find
  566 ttyAMA0  00:00:00 grep
  567 ttyAMA0  00:00:02 find
  568 ttyAMA0  00:00:00 grep
  569 ttyAMA0  00:00:02 find
  570 ttyAMA0  00:00:00 grep
  571 ttyAMA0  00:00:02 find
  572 ttyAMA0  00:00:00 grep
  573 ttyAMA0  00:00:01 find
  574 ttyAMA0  00:00:00 grep
  575 ttyAMA0  00:00:01 find
  576 ttyAMA0  00:00:00 grep
  577 ttyAMA0  00:00:01 find
  578 ttyAMA0  00:00:00 grep
  579 ttyAMA0  00:00:01 find
  580 ttyAMA0  00:00:00 grep
  581 ttyAMA0  00:00:01 find
  582 ttyAMA0  00:00:00 grep
  583 ttyAMA0  00:00:01 find
  584 ttyAMA0  00:00:00 grep
  585 ttyAMA0  00:00:01 find
  586 ttyAMA0  00:00:00 grep
  587 ttyAMA0  00:00:01 find
  588 ttyAMA0  00:00:00 grep
  589 ttyAMA0  00:00:01 find
  590 ttyAMA0  00:00:00 grep
  591 ttyAMA0  00:00:01 find
  592 ttyAMA0  00:00:00 grep
  593 ttyAMA0  00:00:01 ps
root at vexpress:~#

This refused to crash.

Then I recompiled with GCC as I was using LLVM CLANG. But
same non-problem: no crash.

> The relevant asm is:
(...)
> ... so we're using the new task's mm, but still executing in the context of the
> old task (and using its stack). I suspect the new task's mm doesn't have the
> old task's stack shadow mapped in, and AFAICT we don't map that in explicitly
> anywhere before we switch to the new mm.
>
> Linus, can you look into that?

Yeah it looks like a spot-on identification of the problem, I can try to
think about how we could fix this if I can reproduce it, I keep trying
to provoke the crash :/

Yours,
Linus Walleij



More information about the linux-arm-kernel mailing list