[PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme multipath

Sagi Grimberg sagi at grimberg.me
Tue Mar 21 04:09:03 PDT 2023



On 3/21/23 12:50, Lei Lei2 Yin wrote:
>  From b134e7930b50679ce48e5522ddd37672b1802340 Mon Sep 17 00:00:00 2001
> From: Lei Yin <yinlei2 at lenovo.com>
> Date: Tue, 21 Mar 2023 16:09:08 +0800
> Subject: [PATCH] nvme: fix heap-use-after-free and oops in bio_endio for nvme
>   multipath
> 
> When blk_queue_split works in nvme_ns_head_submit_bio, input bio will be
> splited to two bios. If parent bio is completed first, and the bi_disk
> in parent bio is kfreed by nvme_free_ns, child will access this freed
> bi_disk in bio_endio. This will trigger heap-use-after-free or null
> pointer oops.

Can you explain further? It is unclear to me how we can delete the ns
gendisk

> 
> The following is kasan report:
> 
> BUG: KASAN: use-after-free in bio_endio+0x477/0x500
> Read of size 8 at addr ffff888106f2e3a8 by task kworker/1:1H/241
> 
> CPU: 1 PID: 241 Comm: kworker/1:1H Kdump: loaded Tainted: G           O
>        5.10.167 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: kblockd nvme_requeue_work [nvme_core]
> Call Trace:
>   dump_stack+0x92/0xc4
>   ? bio_endio+0x477/0x500
>   print_address_description.constprop.7+0x1e/0x230
>   ? record_print_text.cold.40+0x11/0x11
>   ? _raw_spin_trylock_bh+0x120/0x120
>   ? blk_throtl_bio+0x225/0x3050
>   ? bio_endio+0x477/0x500
>   ? bio_endio+0x477/0x500
>   kasan_report.cold.9+0x37/0x7c
>   ? bio_endio+0x477/0x500
>   bio_endio+0x477/0x500
>   nvme_ns_head_submit_bio+0x950/0x1130 [nvme_core]
>   ? nvme_find_path+0x7f0/0x7f0 [nvme_core]
>   ? __kasan_slab_free+0x11a/0x150
>   ? bio_endio+0x213/0x500
>   submit_bio_noacct+0x2a4/0xd10
>   ? _dev_info+0xcd/0xff
>   ? _dev_notice+0xff/0xff
>   ? blk_queue_enter+0x6c0/0x6c0
>   ? _raw_spin_lock_irq+0x81/0xd5
>   ? _raw_spin_lock+0xd0/0xd0
>   nvme_requeue_work+0x144/0x18c [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   ? __kthread_parkme+0x8f/0x100
>   ? process_one_work+0x13e0/0x13e0
>   kthread+0x30f/0x3d0
>   ? kthread_parkme+0x80/0x80
>   ret_from_fork+0x1f/0x30
> 
> Allocated by task 52:
>   kasan_save_stack+0x19/0x40
>   __kasan_kmalloc.constprop.11+0xc8/0xd0
>   __alloc_disk_node+0x5c/0x320
>   nvme_alloc_ns+0x6e9/0x1520 [nvme_core]
>   nvme_validate_or_alloc_ns+0x17c/0x370 [nvme_core]
>   nvme_scan_work+0x2d4/0x4d0 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Freed by task 54:
>   kasan_save_stack+0x19/0x40
>   kasan_set_track+0x1c/0x30
>   kasan_set_free_info+0x1b/0x30
>   __kasan_slab_free+0x108/0x150
>   kfree+0xa7/0x300
>   device_release+0x98/0x210
>   kobject_release+0x109/0x3a0
>   nvme_free_ns+0x15e/0x1f7 [nvme_core]
>   nvme_remove_namespaces+0x22f/0x390 [nvme_core]
>   nvme_do_delete_ctrl+0xac/0x106 [nvme_core]
>   process_one_work+0x878/0x13e0
>   worker_thread+0x87/0xf70
>   kthread+0x30f/0x3d0
>   ret_from_fork+0x1f/0x30
> 
> Signed-off-by: Lei Yin <yinlei2 at lenovo.com>
> ---
>   drivers/nvme/host/nvme.h | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index c3e4d9b6f9c0..b441c5ce4157 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -749,8 +749,17 @@ static inline void nvme_trace_bio_complete(struct request *req,
>   {
>   	struct nvme_ns *ns = req->q->queuedata;
>   
> -	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio)
> +	if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) {
>   		trace_block_bio_complete(ns->head->disk->queue, req->bio);
> +
> +		/* Point bio->bi_disk to head disk.
> +		 * This bio maybe as other bio's parent in bio chain. If this bi_disk
> +		 * is kfreed by nvme_free_ns, other bio may get this bio by __bio_chain_endio
> +		 * in bio_endio, and access this bi_disk. This will trigger heap-use-after-free
> +		 * or null pointer oops.
> +		 */
> +		req->bio->bi_disk = ns->head->disk;
> +	}

This is absolutely the wrong place to do this. This is a tracing
function, it should not have any other logic.

What tree is this against anyways? There is no bi_disk in struct bio 
anymore.



More information about the Linux-nvme mailing list