[PATCH] nvme-host: tcp: do not read from the socket if we are performing a reset
Sagi Grimberg
sagi at grimberg.me
Tue May 3 04:48:10 PDT 2022
> nvme_tcp_io_work() may be scheduled while the controller is resetting,
> triggering kernel crashes because of invalid memory accesses.
>
> general protection fault, probably for non-canonical
> address 0x82f4228d6e5c6
Can you please provide more details on what kernel this happened? Does
this reproduce in upstream kernel?
> [exception RIP: _copy_to_iter+1344]
> [ffffa8a9e5c07c90] __skb_datagram_iter at ffffffffb0c339d8
> [ffffa8a9e5c07cf8] skb_copy_datagram_iter at ffffffffb0c33c83
> [ffffa8a9e5c07d28] nvme_tcp_recv_data at ffffffffc04dbff7 [nvme_tcp]
> [ffffa8a9e5c07d78] nvme_tcp_recv_skb at ffffffffc04dc90e [nvme_tcp]
> [ffffa8a9e5c07dc0] tcp_read_sock at ffffffffb0cfedbe
> [ffffa8a9e5c07e10] nvme_tcp_try_recv at ffffffffc04da518 [nvme_tcp]
> [ffffa8a9e5c07e58] nvme_tcp_io_work at ffffffffc04dbc54 [nvme_tcp]
> [ffffa8a9e5c07e88] process_one_work at ffffffffb0505408
> [ffffa8a9e5c07ed0] worker_thread at ffffffffb05059a0
>
> Fix this bug by preventing nvme_tcp_io_work() from running if the queue
> is not flagged as "LIVE" and is not in the process of
> connecting to the target.
The current controller teardown flow is supposed to guarantee that
io_work will not run after a queue is being stopped. See
nvme_tcp_stop_queue() it shuts down the socket to deny any new
datagrams to go to the network (or come from the network), restore
socket calls so data_ready/write_space are not called again, and
cancels the io_work. New submissions are prevented as the queue
state clears LIVE.
So in theory, io_work should not be running in conjunction with
queue shutdown. Can you explain how it happens to get there?
>
> Signed-off-by: Maurizio Lombardi <mlombard at redhat.com>
> ---
> drivers/nvme/host/tcp.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index ad3a2bf2f1e9..e3ef014bbc0b 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -103,6 +103,7 @@ enum nvme_tcp_queue_flags {
> NVME_TCP_Q_ALLOCATED = 0,
> NVME_TCP_Q_LIVE = 1,
> NVME_TCP_Q_POLLING = 2,
> + NVME_TCP_Q_CONNECTING = 3,
> };
>
> enum nvme_tcp_recv_state {
> @@ -1213,6 +1214,10 @@ static void nvme_tcp_io_work(struct work_struct *w)
> bool pending = false;
> int result;
>
> + if (unlikely(!test_bit(NVME_TCP_Q_LIVE, &queue->flags) &&
> + !test_bit(NVME_TCP_Q_CONNECTING, &queue->flags)))
> + return;
How is this protecting anything? If the queue is torn down we may access
other freed memory, not just from accessing the sockets.
> +
> if (mutex_trylock(&queue->send_mutex)) {
> result = nvme_tcp_try_send(queue);
> mutex_unlock(&queue->send_mutex);
> @@ -1670,6 +1675,8 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
> struct nvme_tcp_ctrl *ctrl = to_tcp_ctrl(nctrl);
> int ret;
>
> + set_bit(NVME_TCP_Q_CONNECTING, &ctrl->queues[idx].flags);
> +
> if (idx)
> ret = nvmf_connect_io_queue(nctrl, idx);
> else
> @@ -1683,6 +1690,7 @@ static int nvme_tcp_start_queue(struct nvme_ctrl *nctrl, int idx)
> dev_err(nctrl->device,
> "failed to connect queue: %d ret=%d\n", idx, ret);
> }
> + clear_bit(NVME_TCP_Q_CONNECTING, &ctrl->queues[idx].flags);
> return ret;
> }
>
Question, can you tell from your stack dump if this queue is the admin
queue or I/O queue?
If this is indeed the admin queue, please check if you have a related
fix applied:
0fa0f99fc84e ("nvme: fix a possible use-after-free in controller reset
during load")
More information about the Linux-nvme
mailing list