nvme-tcp bricks my computer

Sagi Grimberg sagi at grimberg.me
Mon Feb 15 16:35:37 EST 2021


> Hi Sagi,

Hey,

> Just to give you an update...
> 
> We're still investigating the root cause of the crash.
> 
> We found a bug in our Discovery Controller related to SGL format (format 
> 0x5A vs. 0x01). When the host sends a "Set Feature" to configure AER/AEN 
> with a SGL format of 0x5A,

This is coming from:
--
static void nvme_tcp_set_sg_null(struct nvme_command *c)
{
         struct nvme_sgl_desc *sg = &c->common.dptr.sgl;

         sg->addr = 0;
         sg->length = 0;
         sg->type = (NVME_TRANSPORT_SGL_DATA_DESC << 4) |
                         NVME_SGL_FMT_TRANSPORT_A;
}
--

  the DC responds with an R2T, which is
> obviously a bug. This does not happen when the SGL format is 0x01. We 
> believe that this R2T, because it is unexpected by the nvme-tcp module, 
> causes the module to crash.

I'm assuming because the R2T has data length of 0? because set_features
does not pass any data (feature offset/value is in the sqe)...

> One of our engineers that is more familiar with kernel modules is 
> currently trying to understand how the R2T would cause nvme-tcp to crash.
> 
> I will let you know if/when I get more info.

Cool, thanks.



More information about the Linux-nvme mailing list