BUG: KCSAN: data-race in folio_batch_move_lru / mpage_read_end_io

Mirsad Todorovac mirsad.todorovac at alu.unizg.hr
Tue Sep 12 09:05:06 PDT 2023



On 9/8/23 17:25, Matthew Wilcox wrote:
> On Thu, Aug 31, 2023 at 03:52:49PM +0100, Matthew Wilcox wrote:
>>>   read to 0xffffef9a44978bc0 of 8 bytes by task 348 on cpu 12:
>>>   folio_batch_move_lru (./include/linux/mm.h:1814 ./include/linux/mm.h:1824 ./include/linux/memcontrol.h:1636 ./include/linux/memcontrol.h:1659 mm/swap.c:216)
>>>   folio_batch_add_and_move (mm/swap.c:235)
>>>   folio_add_lru (./arch/x86/include/asm/preempt.h:95 mm/swap.c:518)
>>>   folio_add_lru_vma (mm/swap.c:538)
>>>   do_anonymous_page (mm/memory.c:4146)
>>
>> This is the part I don't understand.  The path to calling
>> folio_add_lru_vma() comes directly from vma_alloc_zeroed_movable_folio():
>>
> [snip]
>>
>> (sorry that's a lot of lines).  But there's _nowhere_ there that sets
>> PG_locked.  It's a freshly allocated page; all page flags (that are
>> actually flags; ignore the stuff up at the top) should be clear.  We
>> even check that with PAGE_FLAGS_CHECK_AT_PREP.  Plus, it doesn't
>> make sense that we'd start I/O; the page is freshly allocated, full of
>> zeroes; there's no backing store to read the page from.
>>
>> It really feels like this page was freed while it was still under I/O
>> and it's been reallocated to this victim process.
>>
>> I'm going to try a few things and see if I can figure this out.
> 
> I'm having trouble reproducing this.  Can you get it to happen reliably?
> 
> This is what I'm currently running with, and it doesn't trigger.
> I'd expect it to if we were going to hit the KCSAN bug.
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 0c5be12f9336..d22e8798c326 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4439,6 +4439,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
>   	page = __alloc_pages_slowpath(alloc_gfp, order, &ac);
>   
>   out:
> +	VM_BUG_ON_PAGE(page && (page->flags & (PAGE_FLAGS_CHECK_AT_PREP &~ (1 << PG_head))), page);
>   	if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
>   	    unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
>   		__free_pages(page, order);

Hello, Mr. Matthew,

I have applied your patch to 6.6-rc1 torvalds tree vanilla kernel, and so far there was not KCSAN
report on folio_batch_move_lru() repeated. :-/

I realise that it would be good if this is made reproducible, so I will try a couple of other things ...

I am not that familiar with the memory management core, so you should probably tell me what to expect.

So far I got the best match (but probably unrelated) from this KCSAN report
(from secondary_startup_64_no_verify() to bio_endio() the stacktrace is the same):

[   72.708078] ==================================================================
[   72.708094] BUG: KCSAN: data-race in xas_clear_mark / xas_find_marked

[   72.708113] write to 0xffff888105d5dd90 of 8 bytes by interrupt on cpu 19:
[   72.708123] xas_clear_mark (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/bitops.h:178 /home/marvin/linux/kernel/torvalds2/./include/asm-generic/bitops/instrumented-non-atomic.h:115 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:102 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:914)
[   72.708134] __xa_clear_mark (/home/marvin/linux/kernel/torvalds2/lib/xarray.c:1929)
[   72.708143] __folio_end_writeback (/home/marvin/linux/kernel/torvalds2/mm/page-writeback.c:2960)
[   72.708155] folio_end_writeback (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:1613)
[   72.708164] end_page_writeback (/home/marvin/linux/kernel/torvalds2/mm/folio-compat.c:28)
[   72.708175] btrfs_page_clear_writeback (/home/marvin/linux/kernel/torvalds2/fs/btrfs/subpage.c:646) btrfs
[   72.708758] extent_buffer_write_end_io (/home/marvin/linux/kernel/torvalds2/./include/linux/bio.h:84 /home/marvin/linux/kernel/torvalds2/fs/btrfs/extent_io.c:1622) btrfs
[   72.709343] __btrfs_bio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:120) btrfs
[   72.709921] btrfs_orig_bbio_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:164) btrfs
[   72.710521] btrfs_orig_write_end_io (/home/marvin/linux/kernel/torvalds2/fs/btrfs/bio.c:420) btrfs
[   72.711118] bio_endio (/home/marvin/linux/kernel/torvalds2/block/bio.c:1603)
[   72.711128] blk_mq_end_request_batch (/home/marvin/linux/kernel/torvalds2/block/blk-mq.c:851 /home/marvin/linux/kernel/torvalds2/block/blk-mq.c:1089)
[   72.711139] nvme_pci_complete_batch (/home/marvin/linux/kernel/torvalds2/drivers/nvme/host/pci.c:986) nvme
[   72.711185] nvme_irq (/home/marvin/linux/kernel/torvalds2/drivers/nvme/host/pci.c:1086) nvme
[   72.711226] __handle_irq_event_percpu (/home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:158)
[   72.711239] handle_irq_event (/home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:195 /home/marvin/linux/kernel/torvalds2/kernel/irq/handle.c:210)
[   72.711251] handle_edge_irq (/home/marvin/linux/kernel/torvalds2/kernel/irq/chip.c:833)
[   72.711262] __common_interrupt (/home/marvin/linux/kernel/torvalds2/./include/linux/irqdesc.h:161 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:238 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:257)
[   72.711272] common_interrupt (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/irq.c:247 (discriminator 14))
[   72.711282] asm_common_interrupt (/home/marvin/linux/kernel/torvalds2/./arch/x86/include/asm/idtentry.h:636)
[   72.711292] cpuidle_enter_state (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:291)
[   72.711301] cpuidle_enter (/home/marvin/linux/kernel/torvalds2/drivers/cpuidle/cpuidle.c:390)
[   72.711309] call_cpuidle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:135)
[   72.711320] do_idle (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:219 /home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:282)
[   72.711328] cpu_startup_entry (/home/marvin/linux/kernel/torvalds2/kernel/sched/idle.c:378 (discriminator 1))
[   72.711337] start_secondary (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:210 /home/marvin/linux/kernel/torvalds2/arch/x86/kernel/smpboot.c:294)
[   72.711349] secondary_startup_64_no_verify (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/head_64.S:433)

[   72.711366] read to 0xffff888105d5dd90 of 8 bytes by task 555 on cpu 17:
[   72.711377] xas_find_marked (/home/marvin/linux/kernel/torvalds2/./include/linux/xarray.h:1724 /home/marvin/linux/kernel/torvalds2/lib/xarray.c:1354)
[   72.711387] filemap_get_folios_tag (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:1978 /home/marvin/linux/kernel/torvalds2/mm/filemap.c:2266)
[   72.711396] __filemap_fdatawait_range (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:516)
[   72.711405] filemap_fdatawait_range (/home/marvin/linux/kernel/torvalds2/mm/filemap.c:553)
[   72.711414] __btrfs_wait_marked_extents.isra.0 (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1150) btrfs
[   72.712027] btrfs_write_and_wait_transaction (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1169 /home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:1218) btrfs
[   72.712639] btrfs_commit_transaction (/home/marvin/linux/kernel/torvalds2/fs/btrfs/transaction.c:2500) btrfs
[   72.713251] transaction_kthread (/home/marvin/linux/kernel/torvalds2/fs/btrfs/disk-io.c:1537) btrfs
[   72.713856] kthread (/home/marvin/linux/kernel/torvalds2/kernel/kthread.c:388)
[   72.713865] ret_from_fork (/home/marvin/linux/kernel/torvalds2/arch/x86/kernel/process.c:147)
[   72.713876] ret_from_fork_asm (/home/marvin/linux/kernel/torvalds2/arch/x86/entry/entry_64.S:312)

[   72.713889] value changed: 0x0f00c00fff000000 -> 0x0000000fff000000

[   72.713903] Reported by Kernel Concurrency Sanitizer on:
[   72.713910] CPU: 17 PID: 555 Comm: btrfs-transacti Tainted: G             L     6.6.0-rc1-kcsan-dirty #2
[   72.713920] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[   72.713927] ==================================================================

However, I have upgraded to 6.6-rc1 torvalds tree kernel in the meantime.

If you want me, I could test with the 6.5 + your patch.

Best regards,
Mirsad Todorovac



More information about the Linux-nvme mailing list