nvme tcp receive errors

Wed Mar 31 21:49:58 BST 2021

On Wed, Mar 31, 2021 at 12:10:55PM -0700, Sagi Grimberg wrote:
> Hey Keith,
> 
> > While running a read-write mixed workload, we are observing errors like:
> > 
> >    nvme nvme4: queue 2 no space in request 0x1
> 
> This means that we get a data payload from a read request and
> we don't have a bio/bvec space to store it, which means we
> are probably not tracking the request iterator correctly if
> tcpdump shows that we are getting the right data length.
> 
> > Based on tcpdump, all data for this queue is expected to satisfy the
> > command request. I'm not familiar enough with the tcp interfaces, so
> > could anyone provide pointers on how to debug this further?
> 
> What was the size of the I/O that you were using? Is this easily
> reproducible?
> 
> Do you have the below applied:
> ca1ff67d0fb1 ("nvme-tcp: fix possible data corruption with bio merges")
> 0dc9edaf80ea ("nvme-tcp: pass multipage bvec to request iov_iter")
> 
> I'm assuming yes if you are using the latest nvme tree...
> 
> Does the issue still happens when you revert 0dc9edaf80ea?

Thanks for the reply.

This was observed on the recent 5.12-rc4, so it has all the latest tcp
fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
difference. It is currently reproducible, though it can take over an
hour right now.