nvme-tcp bricks my computer

Sagi Grimberg sagi at grimberg.me
Mon Feb 15 16:42:29 EST 2021


>> Hi Sagi,
> 
> Hey,
> 
>> Just to give you an update...
>>
>> We're still investigating the root cause of the crash.
>>
>> We found a bug in our Discovery Controller related to SGL format 
>> (format 0x5A vs. 0x01). When the host sends a "Set Feature" to 
>> configure AER/AEN with a SGL format of 0x5A,
> 
> This is coming from:
> -- 
> static void nvme_tcp_set_sg_null(struct nvme_command *c)
> {
>          struct nvme_sgl_desc *sg = &c->common.dptr.sgl;
> 
>          sg->addr = 0;
>          sg->length = 0;
>          sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) |
>                          NVME_SGL_FMT_TRANSPORT_A;
> }
> -- 
> 
>   the DC responds with an R2T, which is
>> obviously a bug. This does not happen when the SGL format is 0x01. We 
>> believe that this R2T, because it is unexpected by the nvme-tcp 
>> module, causes the module to crash.
> 
> I'm assuming because the R2T has data length of 0? because set_features
> does not pass any data (feature offset/value is in the sqe)...
> 
>> One of our engineers that is more familiar with kernel modules is 
>> currently trying to understand how the R2T would cause nvme-tcp to crash.
>>
>> I will let you know if/when I get more info.
> 
> Cool, thanks.

Does this make the crash go away at least?
--
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 69f59d2c5799..5274cc5800f9 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -568,6 +568,13 @@ static int nvme_tcp_setup_h2c_data_pdu(struct 
nvme_tcp_request *req,
         req->pdu_len = le32_to_cpu(pdu->r2t_length);
         req->pdu_sent = 0;

+       if (unlikely(!req->pdu_len)) {
+               dev_err(queue->ctrl->ctrl.device,
+                       "req %d r2t len is %u, probably a bug...\n",
+                       rq->tag, req->pdu_len);
+               return -EPROTO;
+       }
+
         if (unlikely(req->data_sent + req->pdu_len > req->data_len)) {
                 dev_err(queue->ctrl->ctrl.device,
                         "req %d r2t len %u exceeded data len %u (%zu 
sent)\n",
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index ac2d9ed23cea..d82df6cca801 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
--



More information about the Linux-nvme mailing list