[PATCH 1/4] nvme: fix delete uninitialized controller

Sagi Grimberg sagi at grimberg.me
Tue Jan 3 02:30:45 PST 2023


> nvme-fabric controllers can be deleted by
> /sys/class/nvme/nvme<NS>/delete_controller
> echo 1 > /sys/class/nvme/nvme<NS>/delete_controller
> The above command will call nvme_delete_ctrl_sync().
> This function internally tries to change ctrl->state to NVME_CTRL_DELETING.
> NVME_CTRL_LIVE, NVME_CTRL_RESETTING, and NVME_CTRL_CONNECTING states can
> be changed to NVME_CTRL_DELETING.
> If the state is successfully changed, nvme_do_delete_ctrl() is called,
> which is the actual delete logic of controller.
> 
> controller initialization logic changes ctrl->state.
> NEW -> CONNECTING -> LIVE.
> NVME_CTRL_CONNECTING state doesn't ensure that initialization is done.
> 
> So, delete logic can be called before the finish of controller
> initialization.
> So kernel panic would occur because nvme_do_delete_ctrl() dereferences
> uninitialized values.
> 
> BUG: KASAN: null-ptr-deref in do_raw_spin_trylock+0x67/0x180
> Read of size 4 at addr 00000000000000c0 by task bash/928
> 
> CPU: 7 PID: 928 Comm: bash Not tainted 6.1.0 #35
> nvme nvme0: Connect command failed: host path error
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x57/0x81
>   ? do_raw_spin_trylock+0x67/0x180
>   kasan_report+0xba/0x1f0
> nvme nvme0: failed to connect queue: 0 ret=880
>   ? do_raw_spin_trylock+0x67/0x180
>   ? sysfs_file_ops+0x170/0x170
>   kasan_check_range+0x14a/0x1a0
>   do_raw_spin_trylock+0x67/0x180
>   ? do_raw_spin_lock+0x270/0x270
>   ? nvme_remove_namespaces+0x1bc/0x3d0
>   _raw_spin_lock_irqsave+0x4b/0x90
>   ? blk_mq_quiesce_queue+0x1b/0x160
>   blk_mq_quiesce_queue+0x1b/0x160
>   nvme_tcp_delete_ctrl+0x4b/0x70
>   nvme_do_delete_ctrl+0x135/0x141
>   nvme_sysfs_delete.cold+0x8/0xd
>   kernfs_fop_write_iter+0x34b/0x520
>   vfs_write+0x83a/0xd20
>   ? kernel_write+0x630/0x630
>   ? rcu_read_lock_sched_held+0x12/0x80
>   ? lock_acquire+0x4f4/0x630
>   ? __fget_light+0x51/0x230
>   ksys_write+0xf9/0x1d0
>   ? __ia32_sys_read+0xa0/0xa0
>   ? syscall_enter_from_user_mode+0x1d/0x50
>   do_syscall_64+0x3b/0x90
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7fc955d10104
> 
> Fixes: 1a353d85b02d ("nvme: add fabrics sysfs attributes")
> Signed-off-by: Taehee Yoo <ap420073 at gmail.com>
> ---
>   drivers/nvme/host/core.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index d307ae4d8a57..cd4c80ca66d4 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -243,7 +243,8 @@ static void nvme_delete_ctrl_sync(struct nvme_ctrl *ctrl)
>   	 * since ->delete_ctrl can free the controller.
>   	 */
>   	nvme_get_ctrl(ctrl);
> -	if (nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING))
> +	if (test_bit(NVME_CTRL_STARTED_ONCE, &ctrl->flags) &&
> +	    nvme_change_ctrl_state(ctrl, NVME_CTRL_DELETING))
>   		nvme_do_delete_ctrl(ctrl);

So what is the outcome now? if the controller kept on dangling? what
triggers the controller deletion?

>   	nvme_put_ctrl(ctrl);
>   }

I don't think this is the correct approach.
the delete should fully fence the initialization and then delete
the controller.

In this case, the transport driver should not quiesce a non-existent
queue.

If further synchronization is needed, then it should be added so that
delete will fully fence the initialization.



More information about the Linux-nvme mailing list