[PATCH] nvme-tcp: Fix possible race of io_work and direct send

Thu Jan 7 14:05:32 EST 2021

On Mon, Dec 21, 2020 at 10:06 AM Sagi Grimberg <sagi at grimberg.me> wrote:
> We may send a request (with or without its data) from two paths:
> 1. From our I/O context nvme_tcp_io_work which is triggered from:
> - queue_rq
> - r2t reception
> - socket data_ready and write_space callbacks

> 2. Directly from queue_rq if the send_list is empty (because
> we want to save the context switch associated with scheduling
> our io_work).

> Fixes: db5ad6b7f8cd ("nvme-tcp: try to send request in queue_rq context")
> Reported-by: Potnuri Bharat Teja <bharat at chelsio.com>
> Reported-by: Samuel Jones <sjones at kalrayinc.com>

> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c

> @@ -279,7 +289,7 @@ static inline void nvme_tcp_queue_request(struct nvme_tcp_request *req,
>         if (queue->io_cpu == smp_processor_id() &&
>             sync && empty && mutex_trylock(&queue->send_mutex)) {

but we can't rely on the processor id if we are on preemptible state..

  373.940336] BUG: using smp_processor_id() in preemptible [00000000]
code: kworker/u8:4/457
[  373.942037] caller is nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
[  373.942144] CPU: 0 PID: 457 Comm: kworker/u8:4 Not tainted 5.11.0-rc1+ #116
[  373.942171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[  373.942196] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
[  373.942259] Call Trace:
[  373.942286]  dump_stack+0xd0/0x119
[  373.942346]  check_preemption_disabled+0xec/0xf0
[  373.942397]  nvme_tcp_submit_async_event+0x4da/0x690 [nvme_tcp]
[  373.942449]  nvme_async_event_work+0x11d/0x1f0 [nvme_core]
[  373.942496]  ? nvme_ctrl_loss_tmo_show+0x140/0x140 [nvme_core]
[  373.942576]  process_one_work+0x9f3/0x1630
[  373.942649]  ? pwq_dec_nr_in_flight+0x330/0x330
[  373.942765]  worker_thread+0x9e/0xf60
[  373.942849]  ? process_one_work+0x1630/0x1630
[  373.942889]  kthread+0x362/0x480
[  373.942918]  ? kthread_create_on_node+0x100/0x100

       /*
+        * if we're the first on the send_list and we can try to send
+        * directly, otherwise queue io_work. Also, only do that if we
+        * are on the same cpu, so we don't introduce contention.
+        */
+       if (queue->io_cpu == smp_processor_id() &&
+           sync && empty && mutex_trylock(&queue->send_mutex)) {
+               nvme_tcp_try_send(queue);
+               mutex_unlock(&queue->send_mutex);
+       } else {
+               queue_work_on(queue->io_cpu, nvme_tcp_wq, &queue->io_work);
+       }

so before this patch that check was not needed? I was not sure re all
the involved contexts and how to fix that.. so just sending the report