[PATCH v2 1/1] kasan: fix memory leak of kasan quarantine

Shinichiro Kawasaki shinichiro.kawasaki at wdc.com
Mon Dec 28 08:25:45 EST 2020


On Dec 17, 2020 / 20:18, Kuan-Ying Lee wrote:
> When cpu is going offline, set q->offline as true
> and interrupt happened. The interrupt may call the
> quarantine_put. But quarantine_put do not free the
> the object. The object will cause memory leak.
> 
> Add qlink_free() to free the object.
> 
> Fixes: 6c82d45c7f03 (kasan: fix object remaining in offline per-cpu quarantine)
> Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee at mediatek.com>
> Cc: Andrey Ryabinin <aryabinin at virtuozzo.com>
> Cc: Alexander Potapenko <glider at google.com>
> Cc: Dmitry Vyukov <dvyukov at google.com>
> Cc: Andrew Morton <akpm at linux-foundation.org>
> Cc: Matthias Brugger <matthias.bgg at gmail.com>
> Cc: <stable at vger.kernel.org>    [5.10-]
> ---
>  mm/kasan/quarantine.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> index 0e3f8494628f..cac7c617df72 100644
> --- a/mm/kasan/quarantine.c
> +++ b/mm/kasan/quarantine.c
> @@ -191,6 +191,7 @@ void quarantine_put(struct kasan_free_meta *info, struct kmem_cache *cache)
>  
>  	q = this_cpu_ptr(&cpu_quarantine);
>  	if (q->offline) {
> +		qlink_free(&info->quarantine_link, cache);
>  		local_irq_restore(flags);
>  		return;
>  	}

I ran blktests with kernels v5.10.0 and v5.10.3 enabling KASAN, and observed a
BUG message [1]. The BUG can be recreated with a test case which hotplugs CPUs
during I/O to dm-linear device. The stack trace in the message indicates memory
leak was detected when slab of dm-linear bio set was destroyed.

I bisected and found the commit 6c82d45c7f03 ("kasan: fix object remaining in
offline per-cpu quarantine") triggers the BUG message. I also tried this fix
patch by Kuan-Ying and observed that it avoids the BUG message. I suppose the
fix is required for v5.10.x. Confirmation by kasan maintainers will be
appreciated.

I took following steps to recreate the BUG message:

1. Create dm-linear device on top of HDD (dmsetup create).
2. Run blktests' test case block/008 using the dm-linear device.
3. Remove the dm-linear device (dmsetup remove)

When I repeat the steps, the BUG message is often observed at the step 3.

I repeated the steps with v5.11-rc1 also, and did not observe the BUG. With
v5.11-rc1, quarantine_put() returns bool. Then I think the fix patch is not
required for v5.11, probably.

Wish this report helps for the fix.

[1]

[  151.201998] =============================================================================
[  151.212580] BUG bio-3 (Not tainted): Objects remaining in bio-3 on __kmem_cache_shutdown()
[  151.222321] -----------------------------------------------------------------------------

[  151.234933] Disabling lock debugging due to kernel taint
[  151.241634] INFO: Slab 0x0000000010690f30 objects=36 used=3 fp=0x00000000e7351615 flags=0x17ffffc0010200
[  151.252558] CPU: 6 PID: 1996 Comm: dmsetup Tainted: G    B             5.10.3 #21
[  151.261520] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0 04/24/2015
[  151.270070] Call Trace:
[  151.274029]  dump_stack+0x9a/0xcc
[  151.278844]  slab_err+0xb7/0xdc
[  151.283505]  ? do_raw_spin_lock+0x115/0x240
[  151.289220]  ? rwlock_bug.part.0+0x90/0x90
[  151.294833]  __kmem_cache_shutdown.cold+0x36/0x19f
[  151.301182]  kmem_cache_destroy+0x5d/0x110
[  151.306822]  bio_put_slab+0xd3/0x180
[  151.311952]  bioset_exit+0xa5/0x100
[  151.316989]  cleanup_mapped_device+0x5e/0x310
[  151.322886]  free_dev+0xb8/0x210
[  151.327665]  __dm_destroy+0x2e0/0x470
[  151.332872]  ? dm_blk_report_zones+0x2b0/0x2b0
[  151.338877]  ? _raw_spin_unlock+0x1f/0x30
[  151.344481]  dev_remove+0x223/0x2f0
[  151.349565]  ctl_ioctl+0x384/0x970
[  151.354572]  ? remove_all+0x90/0x90
[  151.359677]  ? find_held_lock+0x2c/0x110
[  151.365211]  ? free_params+0x30/0x30
[  151.370401]  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
[  151.377181]  dm_ctl_ioctl+0xa/0x10
[  151.382224]  __x64_sys_ioctl+0x127/0x190
[  151.387800]  do_syscall_64+0x33/0x40
[  151.393017]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  151.399713] RIP: 0033:0x7f6fe944338b
[  151.404947] Code: 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b5 e8 1c ff ff ff 85 c0 78 ba 4c 89 e0 5b 5d 41 5c c3 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd ba 0c 00 f7 d8 64 89 01 48
[  151.426499] RSP: 002b:00007ffd00632f08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[  151.435871] RAX: ffffffffffffffda RBX: 00007f6fe9523f30 RCX: 00007f6fe944338b
[  151.444829] RDX: 0000561f9b938440 RSI: 00000000c138fd04 RDI: 0000000000000003
[  151.453812] RBP: 00007f6fe9560494 R08: 00007f6fe9562de0 R09: 00007ffd00632d70
[  151.462808] R10: 0000000000000000 R11: 0000000000000202 R12: 0000561f9b936c40
[  151.471820] R13: 0000561f9b938470 R14: 0000561f9b9384f0 R15: 0000561f9b938440
[  151.480836] INFO: Object 0x00000000f6e5c796 @offset=5824
[  151.488063] INFO: Object 0x00000000a14f36d2 @offset=6720
[  151.495274] INFO: Object 0x000000005bfef957 @offset=9856

-- 
Best Regards,
Shin'ichiro Kawasaki


More information about the Linux-mediatek mailing list