[PATCH RFC] nvme-tcp: Implement recvmsg() receive flow

Hannes Reinecke hare at suse.de
Wed Feb 25 07:02:50 PST 2026


On 2/25/26 14:15, Alistair Francis wrote:
> On Wed, 2026-02-25 at 12:41 +0100, Hannes Reinecke wrote:
>> On 2/25/26 11:56, Alistair Francis wrote:
[ .. ]
>>>
>>> This doesn't work unfortunately.
>>>
>>> The problem is what happens if queue->data_remaining is smaller
>>> then iov_iter_count(&req->iter)?
>>>
>>> queue->data_remaining is set by the data length in the c2h, while
>>> the length of the request iov_iter_count(&req->iter) is set when the
>>> request is submitted.
>>>
>>> If queue->data_remaining ends up being smaller then
>>> iov_iter_count(&req->iter) then we need to read less data then the
>>> actual count of req->iter.
>>>
>>> So we need a iov_iter_truncate(), but then we end up overwriting
>>> the data on the next iteration as we have no way to keep...
>>>
>> Question is, though: what _is_ in the remaining iov?
> 
> Which remaining iov?
> 

Well, if queue->
>> The most reasonable explanation would be that it's the start of the
>> next PDU (which we haven't accounted for, and hence haven't set
>> up pointers correctly).
> 
> The next PDU seems fine, it's just the next data that goes on the
> current (and correct) IOV, just overwriting the previous data as there
> is no offset.
> 

The offset is in the iov (ie you advance the iov iter to capture the offset)

>> I can see this happening for TLS when the sender doesn't space
>> the records correctly (ie if the PDU end is not falling on a
>> TLS record boundary).
>>
>> But yeah, I can see the issue. While we can (and do)
>> advance the iterator to complete the request, we still
>> have the remaining data in the iterator.
> 
>> What we can do, though, is to copy the remaining data over
>> to 'queue->pdu' (as we assume it's the start of the next PDU),
>> set up pointers, and let it rip.
> 
> Ah, I think that's the other way around of what I'm talking about. That
> sounds more like you overflow the current iterator. I'm talking about
> an underflow, where we don't transfer enough data to complete a
> request. So when we continue the transfer we overwrite the previous
> data.
> 
To my understanding queue->remaining is the number of bytes left
for _this_ PDU (actually, for the PDU data; digests are counted
separately).
And the iterator points to the iovec storing the eventual data.
So in your case iov_len() is _larger_ than queue->remaining.
How so? iov_len() is calculated from the request, and that
should have the correct length set.
So the only way I see how we could arrive in the situation
where iov_len() is larger thatn queue->remaining is when
we have a short read.
But really, both values are setup right at the start
before we even start receiving data.
Shouldn't we catch this issue there?

I guess I'm still confused how you could end up in a situation
where queue->remaining is smaller that the iovec length...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich



More information about the Linux-nvme mailing list