[PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
Maurizio Lombardi
mlombard at redhat.com
Sun Nov 14 23:47:20 PST 2021
Hello Sagi,
On Sun, Nov 14, 2021 at 12:28:45PM +0200, Sagi Grimberg wrote:
>
>
> I think you should shutdown the socket after you restore the socket
> callbacks, and after that cancel_work_sync to deny a self-requeue.
>
> > - flush_work(&queue->io_work);
> > + while (flush_work(&queue->io_work));
> > nvmet_tcp_uninit_data_in_cmds(queue);
> > nvmet_sq_destroy(&queue->nvme_sq);
> >
>
> How about something like this:
> --
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 6eb0b3153477..6425375faf5b 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct
> work_struct *w)
> mutex_unlock(&nvmet_tcp_queue_mutex);
>
> nvmet_tcp_restore_socket_callbacks(queue);
> - flush_work(&queue->io_work);
> + /* stop accepting incoming data */
> + kernel_sock_shutdown(queue->sock, SHUT_RD);
> + cancel_work_sync(&queue->io_work);
>
> nvmet_tcp_uninit_data_in_cmds(queue);
> nvmet_sq_destroy(&queue->nvme_sq);
> --
>
This one doesn't work, remember that nvmet_sq_destroy() will start io_work
again and, as I wrote in my previous email, recvmsg() might still return some
data after the socket has been shut down.
[ 1034.234574] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1034.764498] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1035.291398] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1038.471686] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1039.002853] general protection fault, probably for non-canonical address 0x51ede3866dd88e22: 0000 [#1] SMP PTI
[ 1039.006622] CPU: 1 PID: 4869 Comm: kworker/1:2H Kdump: loaded Tainted: G W --------- --- 5.14.0_nvmet1+ #1
[ 1039.008778] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1039.009840] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[ 1039.011040] RIP: 0010:memcpy_erms+0x6/0x10
[ 1039.011841] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 1039.015216] RSP: 0018:ffffc02c4059fb30 EFLAGS: 00010206
[ 1039.016176] RAX: 51ede3866dd88e22 RBX: 000000000000248e RCX: 0000000000001000
[ 1039.017472] RDX: 0000000000001000 RSI: ffff99d7c397948e RDI: 51ede3866dd88e22
[ 1039.018767] RBP: ffff99d7cc02b000 R08: 0000000000000001 R09: 000000000000fe88
[ 1039.020074] R10: 000000000000012a R11: 0000000000003ed6 R12: 0000000000000000
[ 1039.021371] R13: 0000000000001000 R14: ffff99d7cc02b010 R15: 0000000000001a48
[ 1039.022668] FS: 0000000000000000(0000) GS:ffff99d8f7d00000(0000) knlGS:0000000000000000
[ 1039.024134] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1039.025181] CR2: 00007ffbf45c1658 CR3: 00000001051e4003 CR4: 00000000003706e0
[ 1039.026497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.027836] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1039.029185] Call Trace:
[ 1039.029696] _copy_to_iter+0x44d/0x710
[ 1039.030483] ? __virt_addr_valid+0x45/0x70
[ 1039.031284] ? __check_object_size.part.0+0x45/0x140
[ 1039.032200] __skb_datagram_iter+0x19b/0x2d0
[ 1039.032990] ? zerocopy_sg_from_iter+0x50/0x50
[ 1039.033788] skb_copy_datagram_iter+0x33/0x80
[ 1039.034568] tcp_recvmsg_locked+0x299/0x960
[ 1039.035338] tcp_recvmsg+0x72/0x1d0
[ 1039.036023] ? bpf_lsm_socket_sendmsg+0x10/0x10
[ 1039.036929] inet_recvmsg+0x57/0xf0
[ 1039.037624] nvmet_tcp_try_recv.constprop.0+0xcb/0x310 [nvmet_tcp]
[ 1039.038913] ? pick_next_task_fair+0x39/0x3b0
[ 1039.039771] nvmet_tcp_io_work+0x44/0xabd [nvmet_tcp]
[ 1039.040684] ? dequeue_task_fair+0xb1/0x370
[ 1039.041430] ? finish_task_switch.isra.0+0xb1/0x290
[ 1039.042305] process_one_work+0x1e6/0x380
[ 1039.043061] worker_thread+0x53/0x3d0
[ 1039.043720] ? process_one_work+0x380/0x380
[ 1039.044467] kthread+0x10f/0x130
[ 1039.045059] ? set_kthread_struct+0x40/0x40
[ 1039.045806] ret_from_fork+0x22/0x30
Maurizio
More information about the Linux-nvme
mailing list