nvme tcp receive errors

Fri Apr 2 18:11:41 BST 2021

On Thu, Apr 01, 2021 at 05:49:58AM +0900, Keith Busch wrote:
> On Wed, Mar 31, 2021 at 12:10:55PM -0700, Sagi Grimberg wrote:
> > Hey Keith,
> > 
> > > While running a read-write mixed workload, we are observing errors like:
> > > 
> > >    nvme nvme4: queue 2 no space in request 0x1
> > 
> > This means that we get a data payload from a read request and
> > we don't have a bio/bvec space to store it, which means we
> > are probably not tracking the request iterator correctly if
> > tcpdump shows that we are getting the right data length.
> > 
> > > Based on tcpdump, all data for this queue is expected to satisfy the
> > > command request. I'm not familiar enough with the tcp interfaces, so
> > > could anyone provide pointers on how to debug this further?
> > 
> > What was the size of the I/O that you were using? Is this easily
> > reproducible?
> > 
> > Do you have the below applied:
> > ca1ff67d0fb1 ("nvme-tcp: fix possible data corruption with bio merges")
> > 0dc9edaf80ea ("nvme-tcp: pass multipage bvec to request iov_iter")
> > 
> > I'm assuming yes if you are using the latest nvme tree...
> > 
> > Does the issue still happens when you revert 0dc9edaf80ea?
> 
> Thanks for the reply.
> 
> This was observed on the recent 5.12-rc4, so it has all the latest tcp
> fixes. I'll check with reverting 0dc9edaf80ea and see if that makes a
> difference. It is currently reproducible, though it can take over an
> hour right now.

After reverting 0dc9edaf80ea, we are observing a kernel panic (below). We'll
try adding it back, plust adding your debug patch.

  BUG: kernel NULL pointer dereference, address: 000000000000002c
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0 
  Oops: 0000 [#1] SMP PTI
  CPU: 15 PID: 49319 Comm: fio Tainted: G           OE     5.12.0-051200rc4-generic #202103212230
  Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0a 08/02/2016
  RIP: 0010:nvme_tcp_init_iter+0x55/0xf0 [nvme_tcp]
  Code: ff ff b9 01 00 00 00 45 31 e4 48 8d 7b 68 4c 89 da 44 89 d6 e8 ec e4 16 c8 4c 89 63 70 5b 41 5c 41 5d 41 5e 5d c3 48 8b 57 60 <8b> 42 2c 4c 8b 6a 78 44 8b 42 28 44 8b 62 30 49 89 c3 48 89 c7 49
  RSP: 0018:ffffaafccd0eb920 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff9ad834bfcf28 RCX: 0000000000002000
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9ad834bfcf28
  RBP: ffffaafccd0eb940 R08: 0000000000002000 R09: 0000000000000000
  R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000002000
  R13: fffff67f07c1ce00 R14: ffff9ad78422c5f0 R15: ffff9ad834bfcf28
  FS:  00007f010087d740(0000) GS:ffff9adedfdc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 000000000000002c CR3: 000000027b11e005 CR4: 00000000003706e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   nvme_tcp_try_send_data+0x282/0x2e0 [nvme_tcp]
   ? nvme_tcp_init_iter+0x44/0xf0 [nvme_tcp]
   nvme_tcp_try_send+0x12f/0x1b0 [nvme_tcp]
   ? blk_mq_start_request+0x3a/0x100
   nvme_tcp_queue_rq+0x159/0x180 [nvme_tcp]
   __blk_mq_try_issue_directly+0x116/0x1e0
   blk_mq_try_issue_list_directly+0x174/0x2b0
   blk_mq_sched_insert_requests+0xa5/0xf0
   blk_mq_flush_plug_list+0x106/0x1b0
   blk_flush_plug_list+0xdd/0x100
   blk_finish_plug+0x29/0x40
   __blkdev_direct_IO+0x2ef/0x480
   ? aio_fsync_work+0xf0/0xf0
   blkdev_direct_IO+0x56/0x80
   generic_file_read_iter+0x9c/0x140
   blkdev_read_iter+0x35/0x40
   aio_read+0xe0/0x1a0
   ? __cond_resched+0x35/0x50
   ? slab_pre_alloc_hook.constprop.0+0x96/0xe0
   __io_submit_one.constprop.0+0x107/0x1f0
   io_submit_one+0xe3/0x3a0
   __x64_sys_io_submit+0x84/0x180
   do_syscall_64+0x38/0x90
   entry_SYSCALL_64_after_hwframe+0x44/0xae