[bug report] nvme/063 failure (tcp transport)
Hannes Reinecke
hare at suse.de
Sun Jun 1 23:38:35 PDT 2025
On 6/2/25 04:14, Shinichiro Kawasaki wrote:
> On May 21, 2025 / 20:51, Shin'ichiro Kawasaki wrote:
> [...]
>> With this fix trial patch, the KASAN sauf is still observed. I guess it has
>> another cause and requires more debug work.
>
> I chased down the KASAN suaf, and now I think I understand the cause. When it
> happens, nvme_tcp_create_ctrl() fails to create the nvme- tcp control in the
> call chain below:
>
> nvme_tcp_create_ctrl()
> nvme_tcp_alloc_ctrl() new=true ... Alloc nvme_tcp_ctrl and admin_tag_set
> nvme_tcp_setup_ctrl() new=true
> nvme_tcp_configure_admin_queue() new=true ... Succeed
> nvme_alloc_admin_tag_set() ... Alloc the tag set for admin_tag_set
> nvme_stop_keep_alive()
> nvme_tcp_teardown_admin_queue() remove=false
> nvme_tcp_configure_admin_queue() new=false
> nvme_tcp_alloc_admin_queue() ... Fail, but do not call nvme_remove_admin_tag_set()
> nvme_uninit_ctrl()
> nvme_put_ctrl() ... Free up the nvme_tcp_ctrl and admin_tag_set
>
> In this call chain, the first call of nvme_tcp_configure_admin_queue()
> succeeds with new=true argument. The second call fails with new=false
> argument. This second call does not call nvme_remove_admin_tag_set(),
> due to the new=false argument. Then the admin tag set is not removed.
> nvme_tcp_create_ctrl() assumes that nvme_tcp_setup_ctrl() would call
> nvme_remove_admin_tag_set(), and frees up struct nvme_tcp_ctrl which has
> admin_tag_set field. Later on, the timeout handler accesses the
> admin_tag_set field and causes the BUG KASAN slab-use-after-free.
>
> I created a trial patch below. When the second
> nvme_tcp_configure_admin_queue() call fails, it jumps to "destroy_admin"
> go-to label to call nvme_tcp_teardown_admin_queue() which calls
> nvme_remove_admin_tag_set(). With this fix, the KASAN suaf looks disappearing.
> I will create a formal patch for review. I will post it as a series, which will
> have two patches: one for this KASAN suaf, the other for the WARN.
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index d89c89570d11..74a388550995 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2392,7 +2392,7 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl *ctrl, bool new)
> nvme_tcp_teardown_admin_queue(ctrl, false);
> ret = nvme_tcp_configure_admin_queue(ctrl, false);
> if (ret)
> - return ret;
> + goto destroy_admin;
> }
>
> if (ctrl->icdoff) {
Thanks for debugging, that patch looks correct. Please send a patch.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list