nvme-tcp: i/o errors and stalled host from failure to send command pdus
Sagi Grimberg
sagi at grimberg.me
Wed Aug 31 16:42:06 PDT 2022
>
> Here's a trace including state from nvme_tcp_try_send_cmd_pdu(), like so.
>
> req->dbg.send = 1;
> ret = kernel_sendpage(queue->sock, virt_to_page(pdu), offset, len, flags);
> req->dbg.sendpage = ret;
>
>
> nvme-trace.out
>
> tar-1785 [005] ..... 232.768892:nvme_tcp_queue_rq:
> nvme1: qid=6 tag=18 op=0 data_len=20480
> tar-1785 [005] ..... 232.768894: nvme_tcp_queue_rq:
> nvme1: qid=6 tag=17 op=0 data_len=32768
> tar-1785 [005] .N... 232.768895: nvme_tcp_queue_rq:
> nvme1: qid=6 tag=14 op=0 data_len=32768
> kworker/5:1H-475 [005] ..... 232.768923: nvme_tcp_queue_rq:
> nvme1: qid=6 tag=13 op=0 data_len=32768
> kworker/5:1H-475 [005] ..... 232.768924: nvme_tcp_queue_rq:
> nvme1: qid=6 tag=12 op=0 data_len=12288
> tar-1785 [007] ..... 232.769141: nvme_tcp_queue_rq:
> nvme1: qid=8 tag=69 op=0 data_len=20480
>
>
> dmesg.out (reordered to match nvme-trace.out)
>
> [ 262.889536] nvme nvme1: state: tag 0x18 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> empty 1 lock 1 last 0 more 0 send 0 sendpage -1
Something is not adding up...
lock=1, means that we get into nvme_tcp_send_all(), but somehow we are
not able to send it to the wire?
I'd look into this and add indication that it is pulled from the
req_list, and in turn from the send_list. It looks like it is somehow
picked out and silteny dropped, because as long as it is on the list(s)
another context should have picked it up at some point...
> [ 262.889337] nvme nvme1: state: tag 0x17 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> [ 262.889110] nvme nvme1: state: tag 0x14 io_cpu 5 smp_id 5 smp_id2 7 sync 1
> empty 1 lock 1 last 0 more 0 send 0 sendpage -1
> [ 262.888864] nvme nvme1: state: tag 0x13 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> empty 0 lock 0 last 0 more 1 send 1 sendpage 72
> [ 262.888727] nvme nvme1: state: tag 0x12 io_cpu 5 smp_id 5 smp_id2 5 sync 1
> empty 0 lock 0 last 1 more 1 send 1 sendpage 72
More information about the Linux-nvme
mailing list