WARNING in __do_kernel_fault

Dmitry Vyukov dvyukov at google.com
Fri Mar 12 10:56:40 GMT 2021


On Wed, Jan 27, 2021 at 6:34 PM Will Deacon <will at kernel.org> wrote:
>
> On Wed, Jan 27, 2021 at 06:24:22PM +0100, Dmitry Vyukov wrote:
> > On Wed, Jan 27, 2021 at 6:15 PM Will Deacon <will at kernel.org> wrote:
> > >
> > > On Wed, Jan 27, 2021 at 06:00:30PM +0100, Dmitry Vyukov wrote:
> > > > On Wed, Jan 27, 2021 at 5:56 PM syzbot
> > > > <syzbot+45b6fce29ff97069e2c5 at syzkaller.appspotmail.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit:    2ab38c17 mailmap: remove the "repo-abbrev" comment
> > > > > git tree:       upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=15a25264d00000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=ad43be24faf1194c
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=45b6fce29ff97069e2c5
> > > > > userspace arch: arm64
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+45b6fce29ff97069e2c5 at syzkaller.appspotmail.com
> > > >
> > > > This happens on arm64 instance with mte enabled.
> > > > There is a GPF in reiserfs_xattr_init on x86_64 reported:
> > > > https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde
> > > > so I would assume it's just a plain NULL deref. Is this WARNING not
> > > > indicative of a kernel bug? Or there is something special about this
> > > > particular NULL deref?
> > >
> > > Congratulations, you're the first person to trigger this warning!
> > >
> > > This fires if we take an unexpected data abort in the kernel but when we
> > > get into the fault handler the page-table looks ok (according to the CPU via
> > > an 'AT' instruction). Are you using QEMU system emulation? Perhaps its
> > > handling of AT isn't quite right.
> >
> > Yes, it's qemu-system-aarch64 5.2 with -machine virt,mte=on -cpu max.
> > Do you see any way forward for this issue? Can somehow prove/disprove
> > it's qemu at fault?
> > The instance just started running, but it seems to be the most common
> > crash so far and it seems to happen on _all_ gpf's.
> > You can see all arm64 crashes so far here:
> > https://syzkaller.appspot.com/upstream?manager=ci-qemu2-arm64-mte
> > They all happen in reiserfs_security_init, but locally I got a bunch
> > of different stacks, e.g.:
>
> Your best bet is to hack is_spurious_el1_translation_fault() to dump addr,
> es and par, then we can help decipher the logs here. It could also easily be
> a bug in that code, since it hasn't been run before (well, other than
> contrived testing when I wrote it).

Should dumping of addr/es/par be included into mainline kernel code if
this WARNING is not decipherable without this info?

Also, Andrey localized this to mte=on,virtualization=on combination,
does this point towards qemu bug?



More information about the linux-arm-kernel mailing list