[syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

Dmitry Vyukov dvyukov at google.com
Fri Mar 12 15:12:16 GMT 2021


On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.dooks at codethink.co.uk> wrote:
>
> On 10/03/2021 17:16, Dmitry Vyukov wrote:
> > On Wed, Mar 10, 2021 at 5:46 PM syzbot
> > <syzbot+e74b94fe601ab9552d69 at syzkaller.appspotmail.com> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following issue on:
> >>
> >> HEAD commit:    0d7588ab riscv: process: Fix no prototype for arch_dup_tas..
> >> git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
> >> userspace arch: riscv64
> >>
> >> Unfortunately, I don't have any reproducer for this issue yet.
> >>
> >> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> >> Reported-by: syzbot+e74b94fe601ab9552d69 at syzkaller.appspotmail.com
> >
> > +riscv maintainers
> >
> > This is riscv64-specific.
> > I've seen similar crashes in put_user in other places. It looks like
> > put_user crashes in the user address is not mapped/protected (?).
>
> I've been having a look, and this seems to be down to access of the
> tsk->set_child_tid variable. I assume the fuzzing here is to pass a
> bad address to clone?
>
>  From looking at the code, the put_user() code should have set the
> relevant SR_SUM bit (the value for this, which is 1<<18 is in the
> s2 register in the crash report) and from looking at the compiler
> output from my gcc-10, the code looks to be dong the relevant csrs
> and then csrc around the put_user
>
> So currently I do not understand how the above could have happened
> over than something re-tried the code seqeunce and ended up retrying
> the faulting instruction without the SR_SUM bit set.

I would maybe blame qemu for randomly resetting SR_SUM, but it's
strange that 99% of these crashes are in schedule_tail. If it would be
qemu, then they would be more evenly distributed...

Another observation: looking at a dozen of crash logs, in none of
these cases fuzzer was actually trying to fuzz clone with some insane
arguments. So it looks like completely normal clone's (e..g coming
from pthread_create) result in this crash.

I also wonder why there is ret_from_exception, is it normal? I see
handle_exception disables SR_SUM:
https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73



More information about the linux-riscv mailing list