nvme tcp receive errors
Keith Busch
kbusch at kernel.org
Wed Mar 31 21:49:58 BST 2021
On Wed, Mar 31, 2021 at 12:10:55PM -0700, Sagi Grimberg wrote:
> Hey Keith,
>
> > While running a read-write mixed workload, we are observing errors like:
> >
> > nvme nvme4: queue 2 no space in request 0x1
>
> This means that we get a data payload from a read request and
> we don't have a bio/bvec space to store it, which means we
> are probably not tracking the request iterator correctly if
> tcpdump shows that we are getting the right data length.
>
> > Based on tcpdump, all data for this queue is expected to satisfy the
> > command request. I'm not familiar enough with the tcp interfaces, so
> > could anyone provide pointers on how to debug this further?
>
> What was the size of the I/O that you were using? Is this easily
> reproducible?
>
> Do you have the below applied:
> ca1ff67d0fb1 ("nvme-tcp: fix possible data corruption with bio merges")
> 0dc9edaf80ea ("nvme-tcp: pass multipage bvec to request iov_iter")
>
> I'm assuming yes if you are using the latest nvme tree...
>
> Does the issue still happens when you revert 0dc9edaf80ea?
Thanks for the reply.
This was observed on the recent 5.12-rc4, so it has all the latest tcp
fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
difference. It is currently reproducible, though it can take over an
hour right now.
More information about the Linux-nvme
mailing list