[syzbot] KASAN: invalid-access Read in copy_page

Thu Oct 6 03:24:30 PDT 2022

On Wed, Oct 05, 2022 at 01:38:55PM +0100, James Morse wrote:
> On 27/09/2022 17:55, Andrey Konovalov wrote:
> > On Tue, Sep 6, 2022 at 6:23 PM Catalin Marinas <catalin.marinas at arm.com> wrote:
> >> On Tue, Sep 06, 2022 at 04:39:57PM +0200, Andrey Konovalov wrote:
> >>> On Tue, Sep 6, 2022 at 4:29 PM Catalin Marinas <catalin.marinas at arm.com> wrote:
> >>>>>> Does it take long to reproduce this kasan warning?
> >>>>>
> >>>>> syzbot finds several such cases every day (200 crashes for the past 35 days):
> >>>>> https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
> >>>>> So once it reaches the tested tree, we should have an answer within a day.
[...]
> I've reproduced this with the latest qemu and v6.0 kernel using ubuntu 15.04 user-space.
> 
> The reproducer is just to log in once its booted. The vm has swap, and I've turned the
> memory down low enough to force it to swap. The round trip time is about 15 minutes.
> 
> I've not managed to reproduce it without swap, or with more memory. (but it may be a
> timing thing)

Thanks James. I got the error without swap enabled. Just booted Debian
under Qemu with 256MB of RAM (no graphics), did an 'ls -lR /' and it
triggered shortly after. There's no MTE used in user-space.

==================================================================
BUG: KASAN: invalid-access in copy_page+0x10/0xd0
Read at addr f9ff0000050ba000 by task kcompactd0/28
Pointer tag: [f9], memory tag: [f8]

CPU: 0 PID: 28 Comm: kcompactd0 Tainted: G        W          6.0.0-rc3-dirty #1
Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
Call trace:
 dump_backtrace.part.0+0xdc/0xf0
 show_stack+0x1c/0x4c
 dump_stack_lvl+0x68/0x84
 print_report+0x104/0x610
 kasan_report+0x90/0xb0
 __do_kernel_fault+0x70/0x194
 do_tag_check_fault+0x7c/0x90
 do_mem_abort+0x48/0xa0
 el1_abort+0x40/0x60
 el1h_64_sync_handler+0xdc/0xec
 el1h_64_sync+0x64/0x68
 copy_page+0x10/0xd0
 folio_copy+0x50/0xb0
 migrate_folio+0x50/0x9c
 move_to_new_folio+0xc0/0x1d4
 migrate_pages+0x16b4/0x1740
 compact_zone+0x66c/0xb0c
 proactive_compact_node+0x70/0xac
 kcompactd+0x1b4/0x370
 kthread+0x110/0x114
 ret_from_fork+0x10/0x20

The buggy address belongs to the physical page:
page:000000007339140a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff90019 pfn:0x450ba
memcg:f9ff0000052e4000
anon flags: 0x3fffc180088000d(locked|uptodate|dirty|swapbacked|arch_2|node=0|zone=0|lastcpupid=0xffff|kasantag=0x6)
raw: 03fffc180088000d fffffc0000142e48 ffff80000815bd68 fdff000001c738c1
raw: 0000000ffff90019 0000000000000000 00000001ffffffff f9ff0000052e4000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff0000050b9e00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
 ffff0000050b9f00: f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9 f9
>ffff0000050ba000: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
                   ^
 ffff0000050ba100: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
 ffff0000050ba200: f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8 f8
==================================================================

It looks like it always happens on read. Something updated the tag in
page->flags for an existing page (or repainted the page, though less
likely as I think the page is in use).

I'm surprised that even without MTE in user-space, we still get
PG_mte_tagged (arch_2) set. Time for more printks.

-- 
Catalin