crash at nvme_tcp_init_iter with header digest enabled

Fri Aug 19 00:51:17 PDT 2022

Hi,

we got a customer bug report against our downstream kernel
when doing fail over tests with header digest enabled.

The whole crash looks like an user after free bug but
so far we were not able to figure out where it happens.

  nvme nvme13: queue 1: header digest flag is cleared
  nvme nvme13: receive failed:  -71
  nvme nvme13: starting error recovery
  nvme nvme7: Reconnecting in 10 seconds...

  RIP: nvme_tcp_init_iter

  nvme_tcp_recv_skb
  ? tcp_mstamp_refresh
  ? nvme_tcp_submit_async_event
  tcp_read_sock
  nvme_tcp_try_recv
  nvme_tcp_io_work
  process_one_work
  ? process_one_work
  worker_thread
  ? process_one_work
  kthread
  ? set_kthread_struct
  ret_from_fork

In order to rule out that this caused by an reuse of a command id, I
added a test patch which always clears the request pointer (see below)
and hoped to see

   "got bad cqe.command_id %#x on queue %d\n"

but there was none. Instead the crash disappeared. It looks like we are
not clearing the request in the error path, but so far I haven't figured
out how this is related to the header digest enabled.

Anyway, this is just a FYI and in case anyone has an idea where to poke
at; I am listening.

Thanks,
Daniel

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 98cc93d58575..bfadccb90be6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -847,6 +847,13 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
 }
 EXPORT_SYMBOL(blk_mq_tag_to_rq);
 
+void blk_mq_tag_reset(struct blk_mq_tags *tags, unsigned int tag)
+{
+	struct request *rq = tags->rqs[tag];
+	cmpxchg(&tags->rqs[tag], rq, NULL);
+}
+EXPORT_SYMBOL(blk_mq_tag_reset);
+
 static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq,
 			       void *priv, bool reserved)
 {
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 78cfe97031ca..f9a641fb7353 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -504,6 +504,8 @@ static int nvme_tcp_process_nvme_cqe(struct nvme_tcp_queue *queue,
 		nvme_tcp_error_recovery(&queue->ctrl->ctrl);
 		return -EINVAL;
 	}
+	blk_mq_tag_reset(nvme_tcp_tagset(queue),
+			 nvme_tag_from_cid(cqe->command_id));
 
 	req = blk_mq_rq_to_pdu(rq);
 	if (req->status == cpu_to_le16(NVME_SC_SUCCESS))
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 1d18447ebebc..a338ec65f3c8 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -470,6 +470,7 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q,
 		unsigned int op, blk_mq_req_flags_t flags,
 		unsigned int hctx_idx);
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
+void blk_mq_tag_reset(struct blk_mq_tags *tags, unsigned int tag);
 
 enum {
 	BLK_MQ_UNIQUE_TAG_BITS = 16,