[PATCH RFC] nvme-tcp: Implement recvmsg() receive flow

Alistair Francis Alistair.Francis at wdc.com
Thu Feb 26 02:42:38 PST 2026


On Thu, 2026-02-26 at 09:40 +0100, Hannes Reinecke wrote:
> On 2/26/26 00:37, Alistair Francis wrote:
> > On Wed, 2026-02-25 at 16:02 +0100, Hannes Reinecke wrote:
> > > On 2/25/26 14:15, Alistair Francis wrote:
> > > > On Wed, 2026-02-25 at 12:41 +0100, Hannes Reinecke wrote:
> > > > > On 2/25/26 11:56, Alistair Francis wrote:
> > > [ .. ]
> > > > > > 
> > > > > > This doesn't work unfortunately.
> > > > > > 
> > > > > > The problem is what happens if queue->data_remaining is
> > > > > > smaller
> > > > > > then iov_iter_count(&req->iter)?
> > > > > > 
> > > > > > queue->data_remaining is set by the data length in the c2h,
> > > > > > while
> > > > > > the length of the request iov_iter_count(&req->iter) is set
> > > > > > when the
> > > > > > request is submitted.
> > > > > > 
> > > > > > If queue->data_remaining ends up being smaller then
> > > > > > iov_iter_count(&req->iter) then we need to read less data
> > > > > > then
> > > > > > the
> > > > > > actual count of req->iter.
> > > > > > 
> > > > > > So we need a iov_iter_truncate(), but then we end up
> > > > > > overwriting
> > > > > > the data on the next iteration as we have no way to keep...
> > > > > > 
> > > > > Question is, though: what _is_ in the remaining iov?
> > > > 
> > > > Which remaining iov?
> > > > 
> > > 
> > > Well, if queue->
> > > > > The most reasonable explanation would be that it's the start
> > > > > of
> > > > > the
> > > > > next PDU (which we haven't accounted for, and hence haven't
> > > > > set
> > > > > up pointers correctly).
> > > > 
> > > > The next PDU seems fine, it's just the next data that goes on
> > > > the
> > > > current (and correct) IOV, just overwriting the previous data
> > > > as
> > > > there
> > > > is no offset.
> > > > 
> > > 
> > > The offset is in the iov (ie you advance the iov iter to capture
> > > the
> > > offset)
> > > 
> > > > > I can see this happening for TLS when the sender doesn't
> > > > > space
> > > > > the records correctly (ie if the PDU end is not falling on a
> > > > > TLS record boundary).
> > > > > 
> > > > > But yeah, I can see the issue. While we can (and do)
> > > > > advance the iterator to complete the request, we still
> > > > > have the remaining data in the iterator.
> > > > 
> > > > > What we can do, though, is to copy the remaining data over
> > > > > to 'queue->pdu' (as we assume it's the start of the next
> > > > > PDU),
> > > > > set up pointers, and let it rip.
> > 
> > I think I understand this part a bit more now. I'm currently using
> > iov_iter_truncate() to reduce the count of req->iter to match
> > queue-
> > > data_remaining, so I don't see this.
> > 
> > If there was no truncation then the req->iter would fill up with
> > the
> > current data and the next PDU. This is where we could copy that
> > data
> > over and let it rip.
> > 
> > But it still has the same issue in that we can't add future data to
> > an
> > offset in the req->iter. At least not that I can figure out.
> > 
> 
> Ah, I think I see it now.
> The iovec might indeed be larger than queue->remaining
> (data underflows are not that uncommon), and then recvmsg
> might indeed fill the iovec with more data than the PDU
> requires.
> 
> The old code had that bit:
>   		/* we can read only from what is left in this bio */
> 		recv_len = min_t(size_t, recv_len,
> 				iov_iter_count(&req->iter));

That's part of the problem. But the old code also had an offset into
the skb_copy_datagram_iter() function, which the new code doesn't have.
That's the issue.

> (with recv_len being set to queue->data_remaining) to prevent that
> from happening.
> So we should do a
> 
> if (iov_iter_count(&req->iter) > queue->remaining)

You don't need the if(), iov_iter_truncate() handles that for you.

>     iov_iter_truncate(&req->iter, queue->remaining)

I have this already locally. It avoids the req->iter reading too much
data, but it doesn't fix the issue that the next receive will overwrite
the data in req->iter as there is no offset

Alistair

> 
> before issuing recvmsg. That should take care of things.
> 
> Cheers,
> 
> Hannes


More information about the Linux-nvme mailing list