[PATCH RFC] nvme-tcp: Implement recvmsg() receive flow

Hannes Reinecke hare at suse.de
Mon Sep 22 23:30:06 PDT 2025


On 9/22/25 19:41, Christoph Hellwig wrote:
> On Fri, Sep 12, 2025 at 01:58:29PM +0200, Hannes Reinecke wrote:
>> Switch to use recvmsg() so that we get access to TLS control
>> messages eg for handling TLS KeyUpdate.
> 
> That is a very spare commit message for a huge change.  Why did it not
> use recvmsg before?  I'm pretty sure there are some tradeoffs here.
> 
Well, main reason was that the iSCSI driver has been used as a template
for NVMe-TCP, and that one uses ->read_sock().

But technically the reason is that ->read_sock() is driven from the
ingress side; the callback is triggered for each skb received.
Which (theoretically) can drive down latency as the callback is
only invoked when data is to be processed.
Drawback is that we lose access to the control messages, as these
are only extracted by the recvmsg code, so we cannot handle any
specific control messages like TLS alerts or KeyUpdate messages.
And pacing of ->read_sock() invocation gets tricky with TLS,
as the TLS stream is overlaid on TCP segments, so the skbs
received by TCP are not the skbs seen by the ->read_sock()
TLS implementation (stream parser magic).

The recvmsg workflow has the benefit that we're getting full
access to the control messages, so we can handle things like
TLS alerts and KeyUpdate messages. And the recvmsg() call
is driven from the consumer side, so we can get a better
pacing between sendmsg() and recvmsg() calls, which is
particularly important for the NVMe-TCP workflow where
we need to wait for responses before we can continue
with sending more data.

Sadly all performance differences between those two
implementations are completely obscured by some
network buffering effects, so I haven't been able
to come up with a meaningful comparison here.

But I'll improve the description.

>> +static size_t nvme_tcp_ddgst_step(void *iter_base, size_t progress, size_t len,
>> +				  void *priv, void *priv2)
>> +{
>> +	u32 *crcp = priv;
>> +
>> +	*crcp = crc32c(*crcp, iter_base, len);
>> +        return 0;
> 
> And fix the whitespace damage while you're at it.
> 
Sure.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list