nvme tcp receive errors

Keith Busch kbusch at kernel.org
Wed Mar 31 23:26:44 BST 2021


On Wed, Mar 31, 2021 at 03:16:19PM -0700, Sagi Grimberg wrote:
> 
> > > Hey Keith,
> > > 
> > > > While running a read-write mixed workload, we are observing errors like:
> > > > 
> > > >     nvme nvme4: queue 2 no space in request 0x1
> > > 
> > > This means that we get a data payload from a read request and
> > > we don't have a bio/bvec space to store it, which means we
> > > are probably not tracking the request iterator correctly if
> > > tcpdump shows that we are getting the right data length.
> > > 
> > > > Based on tcpdump, all data for this queue is expected to satisfy the
> > > > command request. I'm not familiar enough with the tcp interfaces, so
> > > > could anyone provide pointers on how to debug this further?
> > > 
> > > What was the size of the I/O that you were using? Is this easily
> > > reproducible?
> > > 
> > > Do you have the below applied:
> > > ca1ff67d0fb1 ("nvme-tcp: fix possible data corruption with bio merges")
> > > 0dc9edaf80ea ("nvme-tcp: pass multipage bvec to request iov_iter")
> > > 
> > > I'm assuming yes if you are using the latest nvme tree...
> > > 
> > > Does the issue still happens when you revert 0dc9edaf80ea?
> > 
> > Thanks for the reply.
> > 
> > This was observed on the recent 5.12-rc4, so it has all the latest tcp
> > fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
> > difference. It is currently reproducible, though it can take over an
> > hour right now.
> 
> What is the workload you are running? have an fio job file?
> Is this I/O to a raw block device? or with fs or iosched?

It's O_DIRECT to raw block device using libaio engine. No fs, page
cache, or io scheduler are used.

The fio job is generated by a script that cycles through various sizes,
rw mixes, and io depth. It is not always consistent on which paricular
set of parameters are running when the error message is observed,
though. I can get more details if this will be helpful.
 
> Also, I'm assuming that you are using Linux nvmet as the target
> device?

Not this time. The target is implemented in a hardware device.



More information about the Linux-nvme mailing list