[syzbot] BUG: unable to handle kernel access to user memory in schedule_tail
Ben Dooks
ben.dooks at codethink.co.uk
Thu Mar 18 09:41:02 GMT 2021
On 12/03/2021 17:38, Dmitry Vyukov wrote:
> On Fri, Mar 12, 2021 at 6:34 PM Dmitry Vyukov <dvyukov at google.com> wrote:
>>
>> On Fri, Mar 12, 2021 at 5:36 PM Ben Dooks <ben.dooks at codethink.co.uk> wrote:
>>>
>>> On 12/03/2021 16:34, Ben Dooks wrote:
>>>> On 12/03/2021 16:30, Ben Dooks wrote:
>>>>> On 12/03/2021 15:12, Dmitry Vyukov wrote:
>>>>>> On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.dooks at codethink.co.uk>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 10/03/2021 17:16, Dmitry Vyukov wrote:
>>>>>>>> On Wed, Mar 10, 2021 at 5:46 PM syzbot
>>>>>>>> <syzbot+e74b94fe601ab9552d69 at syzkaller.appspotmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> syzbot found the following issue on:
>>>>>>>>>
>>>>>>>>> HEAD commit: 0d7588ab riscv: process: Fix no prototype for
>>>>>>>>> arch_dup_tas..
>>>>>>>>> git tree:
>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
>>>>>>>>> console output:
>>>>>>>>> https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
>>>>>>>>> kernel config:
>>>>>>>>> https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
>>>>>>>>> dashboard link:
>>>>>>>>> https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
>>>>>>>>> userspace arch: riscv64
>>>>>>>>>
>>>>>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>>>>>
>>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to
>>>>>>>>> the commit:
>>>>>>>>> Reported-by: syzbot+e74b94fe601ab9552d69 at syzkaller.appspotmail.com
>>>>>>>>
>>>>>>>> +riscv maintainers
>>>>>>>>
>>>>>>>> This is riscv64-specific.
>>>>>>>> I've seen similar crashes in put_user in other places. It looks like
>>>>>>>> put_user crashes in the user address is not mapped/protected (?).
>>>>>>>
>>>>>>> I've been having a look, and this seems to be down to access of the
>>>>>>> tsk->set_child_tid variable. I assume the fuzzing here is to pass a
>>>>>>> bad address to clone?
>>>>>>>
>>>>>>> From looking at the code, the put_user() code should have set the
>>>>>>> relevant SR_SUM bit (the value for this, which is 1<<18 is in the
>>>>>>> s2 register in the crash report) and from looking at the compiler
>>>>>>> output from my gcc-10, the code looks to be dong the relevant csrs
>>>>>>> and then csrc around the put_user
>>>>>>>
>>>>>>> So currently I do not understand how the above could have happened
>>>>>>> over than something re-tried the code seqeunce and ended up retrying
>>>>>>> the faulting instruction without the SR_SUM bit set.
>>>>>>
>>>>>> I would maybe blame qemu for randomly resetting SR_SUM, but it's
>>>>>> strange that 99% of these crashes are in schedule_tail. If it would be
>>>>>> qemu, then they would be more evenly distributed...
>>>>>>
>>>>>> Another observation: looking at a dozen of crash logs, in none of
>>>>>> these cases fuzzer was actually trying to fuzz clone with some insane
>>>>>> arguments. So it looks like completely normal clone's (e..g coming
>>>>>> from pthread_create) result in this crash.
>>>>>>
>>>>>> I also wonder why there is ret_from_exception, is it normal? I see
>>>>>> handle_exception disables SR_SUM:
>>>>>> https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73
>>>>>>
>>>>>
>>>>> So I think if SR_SUM is set, then it faults the access to user memory
>>>>> which the _user() routines clear to allow them access.
>>>>>
>>>>> I'm thinking there is at least one issue here:
>>>>>
>>>>> - the test in fault is the wrong way around for die kernel
>>>>> - the handler only catches this if the page has yet to be mapped.
>>>>>
>>>>> So I think the test should be:
>>>>>
>>>>> if (!user_mode(regs) && addr < TASK_SIZE &&
>>>>> unlikely(regs->status & SR_SUM)
>>>>>
>>>>> This then should continue on and allow the rest of the handler to
>>>>> complete mapping the page if it is not there.
>>>>>
>>>>> I have been trying to create a very simple clone test, but so far it
>>>>> has yet to actually trigger anything.
>>>>
>>>> I should have added there doesn't seem to be a good way to use mmap()
>>>> to allocate memory but not insert a vm-mapping post the mmap().
>>>>
>>> How difficult is it to try building a branch with the above test
>>> modified?
>>
>> I don't have access to hardware, I don't have other qemu versions ready to use.
>> But I can teach you how to run syzkaller locally :)
>> I am not sure anybody run it on real riscv hardware at all. When
>> Tobias ported syzkaller, Tobias also used qemu I think.
>>
>> I am now building with an inverted check to test locally.
>>
>> I don't fully understand but this code, but does handle_exception
>> reset SR_SUM around do_page_fault? If so, then looking at SR_SUM in
>> do_page_fault won't work with positive nor negative check.
>
>
> The inverted check crashes during boot:
>
> --- a/arch/riscv/mm/fault.c
> +++ b/arch/riscv/mm/fault.c
> @@ -249,7 +249,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> flags |= FAULT_FLAG_USER;
>
> if (!user_mode(regs) && addr < TASK_SIZE &&
> - unlikely(!(regs->status & SR_SUM)))
> + unlikely(regs->status & SR_SUM))
> die_kernel_fault("access to user memory without
> uaccess routines",
> addr, regs);
>
>
> [ 77.349329][ T1] Run /sbin/init as init process
> [ 77.868371][ T1] Unable to handle kernel access to user memory
> without uaccess routines at virtual address 00000000000e8e39
> [ 77.870355][ T1] Oops [#1]
> [ 77.870766][ T1] Modules linked in:
> [ 77.871326][ T1] CPU: 0 PID: 1 Comm: init Not tainted
> 5.12.0-rc2-00010-g0d7588ab9ef9-dirty #42
> [ 77.872057][ T1] Hardware name: riscv-virtio,qemu (DT)
> [ 77.872620][ T1] epc : __clear_user+0x36/0x4e
> [ 77.873285][ T1] ra : padzero+0x9c/0xb0
> [ 77.873849][ T1] epc : ffffffe000bb7136 ra : ffffffe0004f42a0 sp
> : ffffffe006f8fbc0
> [ 77.874438][ T1] gp : ffffffe005d25718 tp : ffffffe006f98000 t0
> : 00000000000e8e40
> [ 77.875031][ T1] t1 : 00000000000e9000 t2 : 000000000001c49c s0
> : ffffffe006f8fbf0
> [ 77.875618][ T1] s1 : 00000000000001c7 a0 : 00000000000e8e39 a1
> : 00000000000001c7
> [ 77.876204][ T1] a2 : 0000000000000002 a3 : 00000000000e9000 a4
> : ffffffe006f99000
> [ 77.876787][ T1] a5 : 0000000000000000 a6 : 0000000000f00000 a7
> : ffffffe00031c088
> [ 77.877367][ T1] s2 : 00000000000e8e39 s3 : 0000000000001000 s4
> : 0000003ffffffe39
> [ 77.877952][ T1] s5 : 00000000000e8e39 s6 : 00000000000e9570 s7
> : 00000000000e8e39
> [ 77.878535][ T1] s8 : 0000000000000001 s9 : 00000000000e8e39
> s10: ffffffe00c65f608
> [ 77.879126][ T1] s11: ffffffe00816e8d8 t3 : ea3af0fa372b8300 t4
> : 0000000000000003
> [ 77.879711][ T1] t5 : ffffffc401dc45d8 t6 : 0000000000040000
> [ 77.880209][ T1] status: 0000000000040120 badaddr:
> 00000000000e8e39 cause: 000000000000000f
> [ 77.880846][ T1] Call Trace:
> [ 77.881213][ T1] [<ffffffe000bb7136>] __clear_user+0x36/0x4e
> [ 77.881912][ T1] [<ffffffe0004f523e>] load_elf_binary+0xf8a/0x2400
> [ 77.882562][ T1] [<ffffffe0003e1802>] bprm_execve+0x5b0/0x1080
> [ 77.883145][ T1] [<ffffffe0003e38bc>] kernel_execve+0x204/0x288
> [ 77.883727][ T1] [<ffffffe003b70e94>] run_init_process+0x1fe/0x212
> [ 77.884337][ T1] [<ffffffe003b70ec6>] try_to_run_init_process+0x1e/0x66
> [ 77.884956][ T1] [<ffffffe003bc0864>] kernel_init+0x14a/0x200
> [ 77.885541][ T1] [<ffffffe000005570>] ret_from_exception+0x0/0x14
> [ 77.886955][ T1] ---[ end trace 1e934d07b8a4bed8 ]---
> [ 77.887705][ T1] Kernel panic - not syncing: Fatal exception
> [ 77.888333][ T1] SMP: stopping secondary CPUs
> [ 77.889357][ T1] Rebooting in 86400 seconds..
I have reproduced this on qemu, not managed to get the real hardwre
working with this branch yet.
I have a working hypothesis now, having added debug to check the
sstatus.SR_SUM flag and reviewed the assembly, I think this is
what is happening:
C code of "put_user(func(), address)" is generating code to do:
1: __enable_user_access();
2: cpu_reg = func();
3: assembly for *address = cpu_reg;
4: __disable_user_access();
I think the call to func() with all the sanitisers enabled allow
the func() to possibly schedule out. The __swtich_to() code does
not restore the original status registers which means that if
there is IO during the sleep SR_SUM may end up being cleared and
never re-set. We get back to 3 and fault as 2 cleared the result of 1.
It is very possible no-one has seen this before as generally the
functions involved in feeding put_user() are fairly small and thus
this system is both under load and has some reason to schedule then
this bug has probably been rare to unseen.
I think the correct solution is to store the SR_SUM bit status in
the thread_struct and make __switch_to() save/restore this when
changing between tasks/threads. Trying to re-order the code to
force swapping of 1 and 2 may reduce the bug's window.
Further thinking of the order of 1 and 2 is that we should probably
fix that order so that func() is not run with the user-space access
protection disabled.
I'll try and make some sort of of small test case to avoid having
to run syz-stress to provoke this.
--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius
https://www.codethink.co.uk/privacy.html
More information about the linux-riscv
mailing list