nvme tcp receive errors

Wed Mar 31 23:16:19 BST 2021

>> Hey Keith,
>>
>>> While running a read-write mixed workload, we are observing errors like:
>>>
>>>     nvme nvme4: queue 2 no space in request 0x1
>>
>> This means that we get a data payload from a read request and
>> we don't have a bio/bvec space to store it, which means we
>> are probably not tracking the request iterator correctly if
>> tcpdump shows that we are getting the right data length.
>>
>>> Based on tcpdump, all data for this queue is expected to satisfy the
>>> command request. I'm not familiar enough with the tcp interfaces, so
>>> could anyone provide pointers on how to debug this further?
>>
>> What was the size of the I/O that you were using? Is this easily
>> reproducible?
>>
>> Do you have the below applied:
>> ca1ff67d0fb1 ("nvme-tcp: fix possible data corruption with bio merges")
>> 0dc9edaf80ea ("nvme-tcp: pass multipage bvec to request iov_iter")
>>
>> I'm assuming yes if you are using the latest nvme tree...
>>
>> Does the issue still happens when you revert 0dc9edaf80ea?
> 
> Thanks for the reply.
> 
> This was observed on the recent 5.12-rc4, so it has all the latest tcp
> fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
> difference. It is currently reproducible, though it can take over an
> hour right now.

What is the workload you are running? have an fio job file?
Is this I/O to a raw block device? or with fs or iosched?

Also, I'm assuming that you are using Linux nvmet as the target
device?