[PATCH RFC] nvme-tcp: Implement recvmsg() receive flow

Alistair Francis Alistair.Francis at wdc.com
Wed Feb 25 15:37:04 PST 2026


On Wed, 2026-02-25 at 16:02 +0100, Hannes Reinecke wrote:
> On 2/25/26 14:15, Alistair Francis wrote:
> > On Wed, 2026-02-25 at 12:41 +0100, Hannes Reinecke wrote:
> > > On 2/25/26 11:56, Alistair Francis wrote:
> [ .. ]
> > > > 
> > > > This doesn't work unfortunately.
> > > > 
> > > > The problem is what happens if queue->data_remaining is smaller
> > > > then iov_iter_count(&req->iter)?
> > > > 
> > > > queue->data_remaining is set by the data length in the c2h,
> > > > while
> > > > the length of the request iov_iter_count(&req->iter) is set
> > > > when the
> > > > request is submitted.
> > > > 
> > > > If queue->data_remaining ends up being smaller then
> > > > iov_iter_count(&req->iter) then we need to read less data then
> > > > the
> > > > actual count of req->iter.
> > > > 
> > > > So we need a iov_iter_truncate(), but then we end up
> > > > overwriting
> > > > the data on the next iteration as we have no way to keep...
> > > > 
> > > Question is, though: what _is_ in the remaining iov?
> > 
> > Which remaining iov?
> > 
> 
> Well, if queue->
> > > The most reasonable explanation would be that it's the start of
> > > the
> > > next PDU (which we haven't accounted for, and hence haven't set
> > > up pointers correctly).
> > 
> > The next PDU seems fine, it's just the next data that goes on the
> > current (and correct) IOV, just overwriting the previous data as
> > there
> > is no offset.
> > 
> 
> The offset is in the iov (ie you advance the iov iter to capture the
> offset)
> 
> > > I can see this happening for TLS when the sender doesn't space
> > > the records correctly (ie if the PDU end is not falling on a
> > > TLS record boundary).
> > > 
> > > But yeah, I can see the issue. While we can (and do)
> > > advance the iterator to complete the request, we still
> > > have the remaining data in the iterator.
> > 
> > > What we can do, though, is to copy the remaining data over
> > > to 'queue->pdu' (as we assume it's the start of the next PDU),
> > > set up pointers, and let it rip.

I think I understand this part a bit more now. I'm currently using
iov_iter_truncate() to reduce the count of req->iter to match queue-
>data_remaining, so I don't see this.

If there was no truncation then the req->iter would fill up with the
current data and the next PDU. This is where we could copy that data
over and let it rip.

But it still has the same issue in that we can't add future data to an
offset in the req->iter. At least not that I can figure out.

> > 
> > Ah, I think that's the other way around of what I'm talking about.
> > That
> > sounds more like you overflow the current iterator. I'm talking
> > about
> > an underflow, where we don't transfer enough data to complete a
> > request. So when we continue the transfer we overwrite the previous
> > data.
> > 
> To my understanding queue->remaining is the number of bytes left
> for _this_ PDU (actually, for the PDU data; digests are counted
> separately).

Yeah, that's correct.

> And the iterator points to the iovec storing the eventual data.
> So in your case iov_len() is _larger_ than queue->remaining.
> How so? iov_len() is calculated from the request, and that
> should have the correct length set.

What I'm seeing is that nvme_submit_sync_cmd() is called with a length
of 4096 (called from nvme_identify_ctrl()).

That results in a req->iter with a count of 4096.

The data from the device is sent in 3 transfers (data_remaining)

1412 bytes
1412 bytes
1272 bytes

They all target the same req->iter.

In the previous code the offset handled this fine, but we no longer
have that.

> So the only way I see how we could arrive in the situation
> where iov_len() is larger thatn queue->remaining is when
> we have a short read.
> But really, both values are setup right at the start
> before we even start receiving data.
> Shouldn't we catch this issue there?

This happens really early as part of nvme_identify_ctrl()

Alistair

> 
> I guess I'm still confused how you could end up in a situation
> where queue->remaining is smaller that the iovec length...
> 
> Cheers,
> 
> Hannes


More information about the Linux-nvme mailing list