syzkaller on risc-v

Mon Jul 13 21:21:21 EDT 2020

On Mon, 06 Jul 2020 03:12:05 PDT (-0700), colin.king at canonical.com wrote:
> FYI, increasing the THREAD_SIZE_ORDER to 2 fixes the gcov stack crashes
> I'm seeing on a 5.4 kernel.

Sorry I'm a bit slow here, but I think setting THREAD_SIZE_ORDER to 2 on the
64-bit targets seems in line with what everyone else is doing.  IIRC this is
essentially the stack size, which does tend to be larger on 64-bit platforms,
so it seems reasonable.  I wouldn't be terribly surprised if we also have
larger stacks on rv32 than other platforms do, but I think it's best to avoid
increasing the size over there without at least seeing some failures.

I've just sent out a patch, as I don't see one in my inbox.

Thanks!

>
> On 30/06/2020 14:57, David Abdurachmanov wrote:
>> On Tue, Jun 30, 2020 at 4:38 PM Colin Ian King <colin.king at canonical.com> wrote:
>>>
>>> I believe I'm also seeing some potential stack smashing issues in the
>>> lua engine in ZFS on risc-v. It is taking a while for me to debug, but I
>>> don't see the failure on other arches.  Is there a way to bump the stack
>>> size up temporarily to test with larger stacks on risc-v?
>>
>> Dmitry wrote on the original email that the follow solves issues with
>> KCOV enabled:
>>
>> --- a/arch/riscv/include/asm/thread_info.h
>> +++ b/arch/riscv/include/asm/thread_info.h
>> -#define THREAD_SIZE_ORDER      (1)
>> +#define THREAD_SIZE_ORDER      (2)
>>
>> I see MIPS have:
>>
>> [..]
>>  80 /* thread information allocation */
>>  81 #if defined(CONFIG_PAGE_SIZE_4KB) && defined(CONFIG_32BIT)
>>  82 #define THREAD_SIZE_ORDER (1)
>>  83 #endif
>>  84 #if defined(CONFIG_PAGE_SIZE_4KB) && defined(CONFIG_64BIT)
>>  85 #define THREAD_SIZE_ORDER (2)
>> [..]
>>
>> david
>>
>>>
>>> Colin
>>>
>>> On 30/06/2020 14:26, David Abdurachmanov wrote:
>>>> On Tue, Jun 30, 2020 at 4:04 PM Andreas Schwab <schwab at suse.de> wrote:
>>>>>
>>>>> On Jun 30 2020, Dmitry Vyukov wrote:
>>>>>
>>>>>> I would assume some stack overflows can happen without KCOV as well.
>>>>>
>>>>> Yes, I see stack overflows quite a lot, like this:
>>>>>
>>>>> [62192.908680] Kernel panic - not syncing: corrupted stack end detected inside scheduler
>>>>> [62192.915752] CPU: 0 PID: 12347 Comm: ld Not tainted 5.7.5-221-default #1 openSUSE Tumbleweed (unreleased)
>>>>> [62192.925204] Call Trace:
>>>>> [62192.927646] [<ffffffe0002028ae>] walk_stackframe+0x0/0xaa
>>>>> [62192.933030] [<ffffffe000202b76>] show_stack+0x2a/0x34
>>>>> [62192.938066] [<ffffffe000557d44>] dump_stack+0x6e/0x88
>>>>> [62192.943098] [<ffffffe00020c2d2>] panic+0xe8/0x26a
>>>>> [62192.947785] [<ffffffe00085ab9c>] schedule+0x0/0xb2
>>>>> [62192.952561] [<ffffffe00085af36>] _cond_resched+0x32/0x44
>>>>> [62192.957859] [<ffffffe0002f18ea>] invalidate_mapping_pages+0xe0/0x1ce
>>>>> [62192.964193] [<ffffffe000370aa4>] inode_lru_isolate+0x238/0x298
>>>>> [62192.970012] [<ffffffe000308098>] __list_lru_walk_one+0x5e/0xf6
>>>>> [62192.975826] [<ffffffe000308516>] list_lru_walk_one+0x42/0x98
>>>>> [62192.981470] [<ffffffe0003717e8>] prune_icache_sb+0x32/0x72
>>>>> [62192.986941] [<ffffffe000358366>] super_cache_scan+0xe4/0x13e
>>>>> [62192.992586] [<ffffffe0002f1fac>] do_shrink_slab+0x10e/0x17e
>>>>> [62192.998142] [<ffffffe0002f2126>] shrink_slab_memcg+0x10a/0x1de
>>>>> [62193.003957] [<ffffffe0002f5314>] shrink_node_memcgs+0x12e/0x1a4
>>>>> [62193.009861] [<ffffffe0002f5484>] shrink_node+0xfa/0x43c
>>>>> [62193.015067] [<ffffffe0002f583e>] shrink_zones+0x78/0x18c
>>>>> [62193.020365] [<ffffffe0002f59f0>] do_try_to_free_pages+0x9e/0x23e
>>>>> [62193.026352] [<ffffffe0002f65ac>] try_to_free_pages+0xb2/0xf4
>>>>> [62193.031991] [<ffffffe000322952>] __alloc_pages_slowpath.constprop.0+0x2d0/0x6c2
>>>>> [62193.039284] [<ffffffe000322e9a>] __alloc_pages_nodemask+0x156/0x1b2
>>>>> [62193.045535] [<ffffffe00030c730>] do_anonymous_page+0x58/0x41c
>>>>> [62193.051266] [<ffffffe00030f50e>] handle_pte_fault+0x12e/0x156
>>>>> [62193.056994] [<ffffffe000310444>] __handle_mm_fault+0xca/0x118
>>>>> [62193.062725] [<ffffffe000310532>] handle_mm_fault+0xa0/0x152
>>>>> [62193.068278] [<ffffffe0002055ba>] do_page_fault+0xd6/0x370
>>>>> [62193.073666] [<ffffffe00020140a>] ret_from_exception+0x0/0xc
>>>>> [62193.079222] [<ffffffe0004fc16a>] copy_page_to_iter_iovec+0x4c/0x154
>>>>
>>>> There was a report from Canonical that enabling gcov causes similar issues.
>>>>
>>>> linux: riscv: corrupted stack detected inside scheduler
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1877954
>>>>
>>>> Adding Colin to CC. So far we couldn't reproduce this locally, I
>>>> guess, because we don't have the right config.
>>>>
>>>> david
>>>>
>>>>
>>>>>
>>>>> or this:
>>>>>
>>>>> [200460.114397] Kernel panic - not syncing: corrupted stack end detected inside scheduler
>>>>> [200460.121553] CPU: 0 PID: 32619 Comm: sh Not tainted 5.7.5-221-default #1 openSUSE Tumbleweed (unreleased)
>>>>> [200460.131090] Call Trace:
>>>>> [200460.133623] [<ffffffe0002028ae>] walk_stackframe+0x0/0xaa
>>>>> [200460.139091] [<ffffffe000202b76>] show_stack+0x2a/0x34
>>>>> [200460.144212] [<ffffffe000557d44>] dump_stack+0x6e/0x88
>>>>> [200460.149335] [<ffffffe00020c2d2>] panic+0xe8/0x26a
>>>>> [200460.154109] [<ffffffe00085ab9c>] schedule+0x0/0xb2
>>>>> [200460.158969] [<ffffffe00085af36>] _cond_resched+0x32/0x44
>>>>> [200460.164348] [<ffffffe000498572>] aa_sk_perm+0x38/0x138
>>>>> [200460.169559] [<ffffffe00048d4b4>] apparmor_socket_sendmsg+0x18/0x20
>>>>> [200460.175817] [<ffffffe0004508e0>] security_socket_sendmsg+0x2a/0x42
>>>>> [200460.182061] [<ffffffe0006f4c0a>] sock_sendmsg+0x1a/0x40
>>>>> [200460.195979] [<ffffffdf817210cc>] xprt_sock_sendmsg+0xb2/0x2b6 [sunrpc]
>>>>> [200460.210450] [<ffffffdf81723bde>] xs_tcp_send_request+0xc6/0x206 [sunrpc]
>>>>> [200460.224930] [<ffffffdf8171f538>] xprt_request_transmit.constprop.0+0x88/0x218 [sunrpc]
>>>>> [200460.240731] [<ffffffdf81720610>] xprt_transmit+0x9a/0x182 [sunrpc]
>>>>> [200460.254858] [<ffffffdf8171a584>] call_transmit+0x68/0xb8 [sunrpc]
>>>>> [200460.268817] [<ffffffdf81726660>] __rpc_execute+0x84/0x222 [sunrpc]
>>>>> [200460.282787] [<ffffffdf81726cea>] rpc_execute+0xac/0xb8 [sunrpc]
>>>>> [200460.296493] [<ffffffdf8171c5ca>] rpc_run_task+0x122/0x178 [sunrpc]
>>>>> [200460.314422] [<ffffffdf82e1533a>] nfs4_do_call_sync+0x64/0x84 [nfsv4]
>>>>> [200460.332514] [<ffffffdf82e1541c>] _nfs4_proc_getattr+0xc2/0xd4 [nfsv4]
>>>>> [200460.350813] [<ffffffdf82e1cafc>] nfs4_proc_getattr+0x48/0x72 [nfsv4]
>>>>> [200460.363307] [<ffffffdf8292c1f6>] __nfs_revalidate_inode+0x104/0x2c8 [nfs]
>>>>> [200460.376204] [<ffffffdf82926d18>] nfs_access_get_cached+0x104/0x212 [nfs]
>>>>> [200460.389112] [<ffffffdf82926f20>] nfs_do_access+0xfa/0x178 [nfs]
>>>>> [200460.401176] [<ffffffdf82927070>] nfs_permission+0x8e/0x184 [nfs]
>>>>> [200460.406497] [<ffffffe000361936>] inode_permission.part.0+0x78/0x118
>>>>> [200460.412838] [<ffffffe0003638ea>] link_path_walk.part.0+0x1bc/0x212
>>>>> [200460.419086] [<ffffffe000363c7e>] path_lookupat+0x34/0x172
>>>>> [200460.424559] [<ffffffe0003653de>] filename_lookup+0x5c/0xf4
>>>>> [200460.430114] [<ffffffe00036551e>] user_path_at_empty+0x3a/0x5e
>>>>> [200460.435931] [<ffffffe00035b838>] vfs_statx+0x62/0xbc
>>>>> [200460.440966] [<ffffffe00035b92a>] __do_sys_newfstatat+0x24/0x3a
>>>>> [200460.446870] [<ffffffe00035bafa>] sys_newfstatat+0x10/0x18
>>>>> [200460.452339] [<ffffffe0002013fc>] ret_from_syscall+0x0/0x2
>>>>>
>>>>> Andreas.
>>>>>
>>>>> --
>>>>> Andreas Schwab, SUSE Labs, schwab at suse.de
>>>>> GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
>>>>> "And now for something completely different."
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv at lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>