nvme-tcp bricks my computer

Wed Feb 10 17:33:40 EST 2021

> Hi Sagi,
> 
> I was finally able to get back to the crash issue.
> 
> Using Wireshark, I compared the PDUs from the nvmet to our home-brewed 
> Central Discovery Controller (CDC). I did not see any major differences 
> in the data itself. However, there is one significant difference in the 
> way that the nvmet and CDC indicate that a command has completed 
> successfully.
> 
> The NVMe-oF spec describes two ways that a Controller can indicate to 
> the Host that a command has completed successfully. One way is to send a 
> Response Capsule with the "status" set to 0. Another way is to set the 
> SUCCESS bit to 1 in the last C2HData PDU. This approach eliminates the 
> need for a Response Capsule PDU. Ref. NVMe-oF specs, section 7.4.5.2, 
> 5th paragraph.
> 
> The CDC sets the last C2HData's SUCCESS bit to 1 instead of sending a 
> Response Capsule. nvmet sets SUCCESS=0 and sends a separate Response 
> Capsule. Could this be the cause of the crash?

Don't think so, but did the host ask for this in the connect?
The host should explicitly ask for it when connecting. With nvme-cli
when connecting one should specify disable_sqflow (this optimization
is only possible if sq flow control is explicitly disabled).

In any event, even if the controller is misbehaving, it shouldn't crash
the host. I'll look into that.